Commit message (Expand) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | html extract: protocols.io, fix americanarchivist | Bryan Newbold | 2020-01-10 | 1 | -1/+7 |
* | more ingest HTML extraction hacks | Bryan Newbold | 2020-01-10 | 1 | -6/+46 |
* | many publisher-specific ingest improvements | Bryan Newbold | 2020-01-10 | 1 | -4/+96 |
* | fill in more html extraction techniques | Bryan Newbold | 2020-01-09 | 1 | -7/+6 |
* | refactor: use print(..., file=sys.stderr) | Bryan Newbold | 2019-12-18 | 1 | -1/+1 |
* | start of hrmars.com ingest support | Bryan Newbold | 2019-11-14 | 1 | -0/+2 |
* | citation_pdf_url with host-relative URLs | Bryan Newbold | 2019-11-13 | 1 | -1/+3 |
* | more progress on file ingest | Bryan Newbold | 2019-11-13 | 1 | -0/+19 |
* | much progress on file ingest path | Bryan Newbold | 2019-10-22 | 1 | -0/+73 |