Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | start a Makefile | Bryan Newbold | 2020-05-07 | 19 | -580/+1039 |
| | | | | | | | | | | Move all "index" functions into classes, each in a separate file. Add lots of type annotations. Use dataclass objects to hold database rows. This aspect will need further refactoring to remove "extra" usage, probably by adding database rows to align with DatabaseInfo more closely. | ||||
* | pytest config | Bryan Newbold | 2020-05-06 | 1 | -0/+10 |
| | |||||
* | gitlab-ci first attempt | Bryan Newbold | 2020-05-06 | 1 | -0/+15 |
| | |||||
* | rename chocula.database | Bryan Newbold | 2020-05-06 | 2 | -1/+1 |
| | |||||
* | start refactoring files into module | Bryan Newbold | 2020-05-06 | 7 | -458/+470 |
| | |||||
* | pipenv: py37, black, mypy | Bryan Newbold | 2020-05-06 | 2 | -227/+226 |
| | |||||
* | update to new(er) ISSN-L mapping file | Bryan Newbold | 2020-05-01 | 2 | -2/+2 |
| | |||||
* | move queries list to sqlite-notebook report format | Bryan Newbold | 2019-12-26 | 4 | -116/+1375 |
| | |||||
* | update URL crawl status snapshot | Bryan Newbold | 2019-12-26 | 2 | -5/+2 |
| | |||||
* | add check to container stat fetch to ensure valid JSON returned | Bryan Newbold | 2019-12-26 | 1 | -1/+1 |
| | |||||
* | add stats and URL crawl status files | Bryan Newbold | 2019-12-24 | 2 | -2/+6 |
| | |||||
* | count chocula logo (yay) | Bryan Newbold | 2019-12-24 | 1 | -0/+0 |
| | |||||
* | example queries to run on sqlite | Bryan Newbold | 2019-12-24 | 2 | -0/+64 |
| | |||||
* | update README with better directions | Bryan Newbold | 2019-12-24 | 2 | -16/+48 |
| | |||||
* | move old scripts into subdirectory | Bryan Newbold | 2019-12-23 | 3 | -0/+0 |
| | |||||
* | update chocula usage of argparse | Bryan Newbold | 2019-12-23 | 1 | -14/+22 |
| | |||||
* | update norwegian CSV importer schema | Bryan Newbold | 2019-12-23 | 1 | -2/+4 |
| | |||||
* | update chocula input data files | Bryan Newbold | 2019-12-23 | 3 | -38/+35 |
| | | | | | Including updating fetch script, README links, and chocula.py path references. | ||||
* | use newer fatcat contianer dump | Bryan Newbold | 2019-09-06 | 2 | -1/+3 |
| | |||||
* | filter out bad ISSN{e,p} | Bryan Newbold | 2019-09-06 | 1 | -0/+5 |
| | | | | | Unfortunately a few hundred of these got pushed into fatcat already; will probably fix with a new fixer bot tool. | ||||
* | last name/publisher cleanups | Bryan Newbold | 2019-09-03 | 1 | -2/+6 |
| | |||||
* | update TODO | Bryan Newbold | 2019-09-03 | 1 | -1/+10 |
| | |||||
* | don't include doaj.org or NCBI homepage URLs | Bryan Newbold | 2019-09-03 | 1 | -0/+4 |
| | |||||
* | improve fatcat_export metadata quality | Bryan Newbold | 2019-09-03 | 1 | -3/+12 |
| | |||||
* | fix SZCEPANSKI typo | Bryan Newbold | 2019-09-03 | 1 | -2/+2 |
| | |||||
* | improve export_fatcat | Bryan Newbold | 2019-08-28 | 1 | -5/+22 |
| | |||||
* | python script to fix fatcat ISSN-Ls | Bryan Newbold | 2019-08-27 | 1 | -0/+75 |
| | |||||
* | hand-coded corrections to invalid fatcat ISSN-Ls | Bryan Newbold | 2019-08-27 | 1 | -88/+88 |
| | |||||
* | current invalid fatcat ISSN-Ls | Bryan Newbold | 2019-08-27 | 1 | -0/+118 |
| | | | | | AKA, list of fatcat containers with an ISSN-L that isn't a valid ISSN (based on checksum) | ||||
* | only fatcat_export 'valid' (syntax) ISSN-Ls | Bryan Newbold | 2019-08-27 | 1 | -1/+1 |
| | |||||
* | include Szczepanski in everything command (oops) | Bryan Newbold | 2019-08-27 | 1 | -0/+1 |
| | |||||
* | updated crossref title file; ISSN-L file link | Bryan Newbold | 2019-08-27 | 3 | -3/+3 |
| | |||||
* | update IA_CRAWL_FILE | Bryan Newbold | 2019-07-31 | 1 | -1/+1 |
| | |||||
* | commit TODO list | Bryan Newbold | 2019-07-31 | 1 | -0/+37 |
| | |||||
* | update fetch.sh with url_status files | Bryan Newbold | 2019-07-31 | 1 | -0/+3 |
| | |||||
* | webarchive_urls separate from regular URLs | Bryan Newbold | 2019-07-31 | 1 | -1/+21 |
| | |||||
* | don't return 'error' for bad CDX lookups | Bryan Newbold | 2019-07-31 | 1 | -1/+3 |
| | |||||
* | add 'export_fatcat' | Bryan Newbold | 2019-07-31 | 1 | -1/+51 |
| | |||||
* | README update | Bryan Newbold | 2019-07-31 | 1 | -21/+35 |
| | |||||
* | more check_issn_urls corner-cases | Bryan Newbold | 2019-07-31 | 1 | -1/+5 |
| | |||||
* | handle 'ttp://' URL prefix corner case | Bryan Newbold | 2019-07-31 | 1 | -0/+2 |
| | |||||
* | broader top-level gitignore | Bryan Newbold | 2019-07-31 | 1 | -0/+25 |
| | |||||
* | remove python 3.5 constraint | Bryan Newbold | 2019-07-31 | 2 | -6/+4 |
| | |||||
* | pipenv: datasette | Bryan Newbold | 2019-07-31 | 2 | -1/+145 |
| | |||||
* | add wikidata SPARQL query | Bryan Newbold | 2019-07-31 | 1 | -0/+35 |
| | |||||
* | sqlite-notebook template for basic chocula stats | Bryan Newbold | 2019-07-31 | 2 | -0/+186 |
| | |||||
* | iterate on homepage url import/stats | Bryan Newbold | 2019-07-31 | 2 | -21/+43 |
| | |||||
* | more issn URL checker fixes | Bryan Newbold | 2019-07-31 | 2 | -11/+27 |
| | |||||
* | major improvements to ISSN URL checker | Bryan Newbold | 2019-07-30 | 1 | -20/+121 |
| | |||||
* | import vanilla ISSN url checker script | Bryan Newbold | 2019-07-30 | 1 | -0/+52 |
| |