aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* start a MakefileBryan Newbold2020-05-0719-580/+1039
| | | | | | | | | | Move all "index" functions into classes, each in a separate file. Add lots of type annotations. Use dataclass objects to hold database rows. This aspect will need further refactoring to remove "extra" usage, probably by adding database rows to align with DatabaseInfo more closely.
* pytest configBryan Newbold2020-05-061-0/+10
|
* gitlab-ci first attemptBryan Newbold2020-05-061-0/+15
|
* rename chocula.databaseBryan Newbold2020-05-062-1/+1
|
* start refactoring files into moduleBryan Newbold2020-05-067-458/+470
|
* pipenv: py37, black, mypyBryan Newbold2020-05-062-227/+226
|
* update to new(er) ISSN-L mapping fileBryan Newbold2020-05-012-2/+2
|
* move queries list to sqlite-notebook report formatBryan Newbold2019-12-264-116/+1375
|
* update URL crawl status snapshotBryan Newbold2019-12-262-5/+2
|
* add check to container stat fetch to ensure valid JSON returnedBryan Newbold2019-12-261-1/+1
|
* add stats and URL crawl status filesBryan Newbold2019-12-242-2/+6
|
* count chocula logo (yay)Bryan Newbold2019-12-241-0/+0
|
* example queries to run on sqliteBryan Newbold2019-12-242-0/+64
|
* update README with better directionsBryan Newbold2019-12-242-16/+48
|
* move old scripts into subdirectoryBryan Newbold2019-12-233-0/+0
|
* update chocula usage of argparseBryan Newbold2019-12-231-14/+22
|
* update norwegian CSV importer schemaBryan Newbold2019-12-231-2/+4
|
* update chocula input data filesBryan Newbold2019-12-233-38/+35
| | | | | Including updating fetch script, README links, and chocula.py path references.
* use newer fatcat contianer dumpBryan Newbold2019-09-062-1/+3
|
* filter out bad ISSN{e,p}Bryan Newbold2019-09-061-0/+5
| | | | | Unfortunately a few hundred of these got pushed into fatcat already; will probably fix with a new fixer bot tool.
* last name/publisher cleanupsBryan Newbold2019-09-031-2/+6
|
* update TODOBryan Newbold2019-09-031-1/+10
|
* don't include doaj.org or NCBI homepage URLsBryan Newbold2019-09-031-0/+4
|
* improve fatcat_export metadata qualityBryan Newbold2019-09-031-3/+12
|
* fix SZCEPANSKI typoBryan Newbold2019-09-031-2/+2
|
* improve export_fatcatBryan Newbold2019-08-281-5/+22
|
* python script to fix fatcat ISSN-LsBryan Newbold2019-08-271-0/+75
|
* hand-coded corrections to invalid fatcat ISSN-LsBryan Newbold2019-08-271-88/+88
|
* current invalid fatcat ISSN-LsBryan Newbold2019-08-271-0/+118
| | | | | AKA, list of fatcat containers with an ISSN-L that isn't a valid ISSN (based on checksum)
* only fatcat_export 'valid' (syntax) ISSN-LsBryan Newbold2019-08-271-1/+1
|
* include Szczepanski in everything command (oops)Bryan Newbold2019-08-271-0/+1
|
* updated crossref title file; ISSN-L file linkBryan Newbold2019-08-273-3/+3
|
* update IA_CRAWL_FILEBryan Newbold2019-07-311-1/+1
|
* commit TODO listBryan Newbold2019-07-311-0/+37
|
* update fetch.sh with url_status filesBryan Newbold2019-07-311-0/+3
|
* webarchive_urls separate from regular URLsBryan Newbold2019-07-311-1/+21
|
* don't return 'error' for bad CDX lookupsBryan Newbold2019-07-311-1/+3
|
* add 'export_fatcat'Bryan Newbold2019-07-311-1/+51
|
* README updateBryan Newbold2019-07-311-21/+35
|
* more check_issn_urls corner-casesBryan Newbold2019-07-311-1/+5
|
* handle 'ttp://' URL prefix corner caseBryan Newbold2019-07-311-0/+2
|
* broader top-level gitignoreBryan Newbold2019-07-311-0/+25
|
* remove python 3.5 constraintBryan Newbold2019-07-312-6/+4
|
* pipenv: datasetteBryan Newbold2019-07-312-1/+145
|
* add wikidata SPARQL queryBryan Newbold2019-07-311-0/+35
|
* sqlite-notebook template for basic chocula statsBryan Newbold2019-07-312-0/+186
|
* iterate on homepage url import/statsBryan Newbold2019-07-312-21/+43
|
* more issn URL checker fixesBryan Newbold2019-07-312-11/+27
|
* major improvements to ISSN URL checkerBryan Newbold2019-07-301-20/+121
|
* import vanilla ISSN url checker scriptBryan Newbold2019-07-301-0/+52
|