aboutsummaryrefslogtreecommitdiffstats
path: root/python/sandcrawler/html_metadata.py
Commit message (Expand)AuthorAgeFilesLines
* html: syntax fixes; resolve relative URLs; extract more XML fulltext URLsBryan Newbold2020-10-301-5/+12
* html: more ingest improvementsBryan Newbold2020-10-301-0/+2
* html: more biblio selectors; resource extractionBryan Newbold2020-10-291-0/+102
* HTML meta: more from online hunting/researchBryan Newbold2020-10-271-3/+54
* HTML metadata: fix type warningsBryan Newbold2020-10-271-1/+3
* start HTML metadata extraction codeBryan Newbold2020-10-271-0/+230