aboutsummaryrefslogtreecommitdiffstats
path: root/chocula.py
Commit message (Expand)AuthorAgeFilesLines
* start refactoring files into moduleBryan Newbold2020-05-061-1469/+0
* update to new(er) ISSN-L mapping fileBryan Newbold2020-05-011-1/+1
* update URL crawl status snapshotBryan Newbold2019-12-261-1/+1
* add stats and URL crawl status filesBryan Newbold2019-12-241-2/+3
* update chocula usage of argparseBryan Newbold2019-12-231-14/+22
* update norwegian CSV importer schemaBryan Newbold2019-12-231-2/+4
* update chocula input data filesBryan Newbold2019-12-231-10/+10
* use newer fatcat contianer dumpBryan Newbold2019-09-061-1/+1
* filter out bad ISSN{e,p}Bryan Newbold2019-09-061-0/+5
* last name/publisher cleanupsBryan Newbold2019-09-031-2/+6
* don't include doaj.org or NCBI homepage URLsBryan Newbold2019-09-031-0/+4
* improve fatcat_export metadata qualityBryan Newbold2019-09-031-3/+12
* fix SZCEPANSKI typoBryan Newbold2019-09-031-2/+2
* improve export_fatcatBryan Newbold2019-08-281-5/+22
* only fatcat_export 'valid' (syntax) ISSN-LsBryan Newbold2019-08-271-1/+1
* include Szczepanski in everything command (oops)Bryan Newbold2019-08-271-0/+1
* updated crossref title file; ISSN-L file linkBryan Newbold2019-08-271-1/+1
* update IA_CRAWL_FILEBryan Newbold2019-07-311-1/+1
* webarchive_urls separate from regular URLsBryan Newbold2019-07-311-1/+21
* add 'export_fatcat'Bryan Newbold2019-07-311-1/+51
* handle 'ttp://' URL prefix corner caseBryan Newbold2019-07-311-0/+2
* iterate on homepage url import/statsBryan Newbold2019-07-311-18/+40
* chocula: sherpa_color in summary; cleanupsBryan Newbold2019-07-301-5/+9
* chocula: openapcBryan Newbold2019-07-301-1/+31
* chocula: json exportBryan Newbold2019-07-301-0/+17
* chocula: fix wikidata_qid inclusionBryan Newbold2019-07-301-2/+3
* chocula: fix wikidata_qid inclusionBryan Newbold2019-07-301-0/+2
* chocula: better ISSN-L handlingBryan Newbold2019-07-301-11/+16
* chocula: updated fetches, new ISSN-L and DOAJ filesBryan Newbold2019-07-301-3/+3
* chocula: wikidata indexingBryan Newbold2019-07-301-4/+48
* chocula: crude publisher type bucketing; field cleanupBryan Newbold2019-07-301-20/+164
* shorter/simpler table namesBryan Newbold2019-07-261-7/+15
* chocula: more host/domain fixesBryan Newbold2019-07-261-3/+8
* GOLD OA parsingBryan Newbold2019-07-261-40/+54
* chocula: fix domain parsingBryan Newbold2019-07-261-10/+47
* more chocula progressBryan Newbold2019-07-141-57/+171
* EZB and szczepanski indexersBryan Newbold2019-07-111-45/+146
* chocula early workBryan Newbold2019-07-101-0/+798