Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | html: refactors/tweaks from testing | Bryan Newbold | 2020-11-06 | 1 | -12/+18 |
| | |||||
* | initial implementation of HTML ingest in existing worker | Bryan Newbold | 2020-11-04 | 1 | -7/+22 |
| | |||||
* | html: some refactoring | Bryan Newbold | 2020-11-03 | 1 | -13/+16 |
| | |||||
* | move transfer encoding helper to sandcrawler/ia.py | Bryan Newbold | 2020-11-03 | 1 | -22/+16 |
| | |||||
* | html: syntax fixes; resolve relative URLs; extract more XML fulltext URLs | Bryan Newbold | 2020-10-30 | 1 | -3/+3 |
| | |||||
* | html: work around firstmonday DOCTYPE issue | Bryan Newbold | 2020-10-30 | 1 | -0/+3 |
| | |||||
* | html: more ingest improvements | Bryan Newbold | 2020-10-30 | 1 | -18/+118 |
| | |||||
* | html ingest: improve data flow | Bryan Newbold | 2020-10-29 | 1 | -18/+41 |
| | |||||
* | better default CLI output (show usage) | Bryan Newbold | 2020-10-29 | 1 | -1/+1 |
| | |||||
* | html: initial ingest implementation | Bryan Newbold | 2020-10-29 | 1 | -0/+193 |