aboutsummaryrefslogtreecommitdiffstats
Commit message (Expand)AuthorAgeFilesLines
* use newer fatcat contianer dumpHEADmasterBryan Newbold2019-09-062-1/+3
* filter out bad ISSN{e,p}Bryan Newbold2019-09-061-0/+5
* last name/publisher cleanupsBryan Newbold2019-09-031-2/+6
* update TODOBryan Newbold2019-09-031-1/+10
* don't include doaj.org or NCBI homepage URLsBryan Newbold2019-09-031-0/+4
* improve fatcat_export metadata qualityBryan Newbold2019-09-031-3/+12
* fix SZCEPANSKI typoBryan Newbold2019-09-031-2/+2
* improve export_fatcatBryan Newbold2019-08-281-5/+22
* python script to fix fatcat ISSN-LsBryan Newbold2019-08-271-0/+75
* hand-coded corrections to invalid fatcat ISSN-LsBryan Newbold2019-08-271-88/+88
* current invalid fatcat ISSN-LsBryan Newbold2019-08-271-0/+118
* only fatcat_export 'valid' (syntax) ISSN-LsBryan Newbold2019-08-271-1/+1
* include Szczepanski in everything command (oops)Bryan Newbold2019-08-271-0/+1
* updated crossref title file; ISSN-L file linkBryan Newbold2019-08-273-3/+3
* update IA_CRAWL_FILEBryan Newbold2019-07-311-1/+1
* commit TODO listBryan Newbold2019-07-311-0/+37
* update fetch.sh with url_status filesBryan Newbold2019-07-311-0/+3
* webarchive_urls separate from regular URLsBryan Newbold2019-07-311-1/+21
* don't return 'error' for bad CDX lookupsBryan Newbold2019-07-311-1/+3
* add 'export_fatcat'Bryan Newbold2019-07-311-1/+51
* README updateBryan Newbold2019-07-311-21/+35
* more check_issn_urls corner-casesBryan Newbold2019-07-311-1/+5
* handle 'ttp://' URL prefix corner caseBryan Newbold2019-07-311-0/+2
* broader top-level gitignoreBryan Newbold2019-07-311-0/+25
* remove python 3.5 constraintBryan Newbold2019-07-312-6/+4
* pipenv: datasetteBryan Newbold2019-07-312-1/+145
* add wikidata SPARQL queryBryan Newbold2019-07-311-0/+35
* sqlite-notebook template for basic chocula statsBryan Newbold2019-07-312-0/+186
* iterate on homepage url import/statsBryan Newbold2019-07-312-21/+43
* more issn URL checker fixesBryan Newbold2019-07-312-11/+27
* major improvements to ISSN URL checkerBryan Newbold2019-07-301-20/+121
* import vanilla ISSN url checker scriptBryan Newbold2019-07-301-0/+52
* chocula: sherpa_color in summary; cleanupsBryan Newbold2019-07-303-6/+12
* chocula: openapcBryan Newbold2019-07-301-1/+31
* chocula: json exportBryan Newbold2019-07-301-0/+17
* chocula: fix wikidata_qid inclusionBryan Newbold2019-07-301-2/+3
* chocula: fix wikidata_qid inclusionBryan Newbold2019-07-302-1/+3
* chocula: better ISSN-L handlingBryan Newbold2019-07-304-24/+41
* chocula: updated fetches, new ISSN-L and DOAJ filesBryan Newbold2019-07-302-7/+10
* chocula: wikidata indexingBryan Newbold2019-07-301-4/+48
* chocula: crude publisher type bucketing; field cleanupBryan Newbold2019-07-302-40/+194
* shorter/simpler table namesBryan Newbold2019-07-262-9/+17
* chocula: more host/domain fixesBryan Newbold2019-07-261-3/+8
* GOLD OA parsingBryan Newbold2019-07-261-40/+54
* chocula: fix domain parsingBryan Newbold2019-07-261-10/+47
* pipenv: pytest for journal_metadataBryan Newbold2019-07-262-4/+83
* chocula READMEBryan Newbold2019-07-141-0/+7
* chocula: fetch SZ jsonBryan Newbold2019-07-141-0/+2
* more chocula progressBryan Newbold2019-07-142-61/+183
* EZB and szczepanski indexersBryan Newbold2019-07-111-45/+146