index
:
chocula
master
[no description]
about
summary
refs
log
tree
commit
diff
stats
log msg
author
committer
range
path:
root
/
chocula
Commit message (
Expand
)
Author
Age
Files
Lines
*
more HomepageUrl filtering
Bryan Newbold
2021-11-24
1
-0
/
+3
*
codespell fix minor typos (there are some more in actual code
Bryan Newbold
2021-11-24
3
-5
/
+5
*
add openalex directory source
Bryan Newbold
2021-11-22
2
-0
/
+70
*
make fmt
Bryan Newbold
2021-04-23
8
-11
/
+28
*
doaj: updates for new file format; removed some fields/metadata
Bryan Newbold
2021-04-23
1
-55
/
+43
*
SIM: cap maximum year of coverage
Bryan Newbold
2020-12-07
1
-0
/
+3
*
database support for scholarsportal and cariniana preservation holdings
Bryan Newbold
2020-10-08
3
-1
/
+72
*
vanished_inactive: more tolerant handling of unicode BOM
Bryan Newbold
2020-10-08
1
-1
/
+2
*
util: parse ISSN format with extra spaces
Bryan Newbold
2020-09-13
1
-0
/
+2
*
update vanished journal importer for 2020-09-03 dataset
Bryan Newbold
2020-09-13
2
-30
/
+18
*
do not create hathitrust-only journal rows
Bryan Newbold
2020-09-02
1
-1
/
+2
*
hathitrust KBART-style importer
Bryan Newbold
2020-09-02
4
-2
/
+106
*
include pkp_pln as a kbart directory in summarization/export/etc
Bryan Newbold
2020-08-31
1
-1
/
+1
*
fmt
Bryan Newbold
2020-08-31
3
-12
/
+29
*
add support for PKP PLN (KBART-like)
Bryan Newbold
2020-08-31
3
-1
/
+57
*
fatcat export improvements
Bryan Newbold
2020-08-03
1
-9
/
+28
*
more blocked URLs and domains
Bryan Newbold
2020-08-03
1
-0
/
+29
*
directories: all extra metadata in top-level dict
Bryan Newbold
2020-08-03
4
-13
/
+9
*
sim: some flag fields as boolean
Bryan Newbold
2020-08-03
1
-2
/
+12
*
doaj bug: wasn't setting extra directory metadata
Bryan Newbold
2020-08-03
1
-9
/
+8
*
remove trailing whitespace from comment
Bryan Newbold
2020-06-25
1
-7
/
+7
*
add MAG importer; reorder directory class listing
Bryan Newbold
2020-06-23
2
-10
/
+73
*
block some meta strings
Bryan Newbold
2020-06-23
1
-0
/
+3
*
skip umi.com in addition to www.umi.com
Bryan Newbold
2020-06-23
1
-0
/
+1
*
road: proper language parsing
Bryan Newbold
2020-06-23
1
-2
/
+6
*
ensure lang is len()==2; prep for original_name column
Bryan Newbold
2020-06-23
1
-0
/
+5
*
make fmt
Bryan Newbold
2020-06-23
1
-34
/
+39
*
tests and fixes for parse_lang(), parse_country()
Bryan Newbold
2020-06-23
1
-19
/
+78
*
block/skip more homepage patterns
Bryan Newbold
2020-06-23
1
-0
/
+9
*
fix langs inclusion in summarization; remove unused/duplicate fields
Bryan Newbold
2020-06-23
1
-2
/
+2
*
strip control characters from titles (issn_meta)
Bryan Newbold
2020-06-23
1
-0
/
+4
*
fix issn_meta country detection
Bryan Newbold
2020-06-23
1
-5
/
+8
*
improve lang parsing
Bryan Newbold
2020-06-23
5
-7
/
+11
*
issn_meta: mainTitle can be an array
Bryan Newbold
2020-06-23
1
-1
/
+4
*
set is_active flag based on directories
Bryan Newbold
2020-06-23
1
-0
/
+5
*
sources, ISSN-L test mappings, __init__ for recent importers
Bryan Newbold
2020-06-23
1
-0
/
+12
*
ZDB homepage (FIZE) scrape importer
Bryan Newbold
2020-06-23
1
-0
/
+34
*
australian ERA journal list importer
Bryan Newbold
2020-06-23
1
-0
/
+54
*
vanished journal metadata importer
Bryan Newbold
2020-06-23
2
-0
/
+113
*
ISSN portal metadata directory importer
Bryan Newbold
2020-06-23
1
-0
/
+61
*
AWOL directory importer
Bryan Newbold
2020-06-23
1
-0
/
+76
*
filter out more meta/index URL hosts
Bryan Newbold
2020-06-23
1
-1
/
+15
*
Revert "EZB color not a good proxy for OA status"
Bryan Newbold
2020-06-23
1
-0
/
+2
*
new manual homepage source
Bryan Newbold
2020-06-23
2
-0
/
+49
*
be more careful with sherpa/romeo color summarization
Bryan Newbold
2020-06-22
1
-3
/
+4
*
EZB color not a good proxy for OA status
Bryan Newbold
2020-06-22
1
-2
/
+0
*
additional small flake8 fixes
Bryan Newbold
2020-06-22
2
-2
/
+2
*
flake8 cleanups
Bryan Newbold
2020-06-22
7
-17
/
+9
*
norwegian: fixes from bugs flake8 helped find
Bryan Newbold
2020-06-22
1
-3
/
+2
*
fmt (black)
Bryan Newbold
2020-06-22
21
-613
/
+766
[next]