Mode | Name | Size | |
---|---|---|---|
-rw-r--r-- | .coveragerc | 62 | logstatsplain |
-rw-r--r-- | .gitignore | 38 | logstatsplain |
-rw-r--r-- | .pylintrc | 558 | logstatsplain |
-rw-r--r-- | Makefile | 883 | logstatsplain |
-rw-r--r-- | Pipfile | 1112 | logstatsplain |
-rw-r--r-- | Pipfile.lock | 27405 | logstatsplain |
-rw-r--r-- | TODO | 236 | logstatsplain |
-rw-r--r-- | example.env | 172 | logstatsplain |
-rwxr-xr-x | grobid2json.py | 7651 | logstatsplain |
-rwxr-xr-x | grobid_tool.py | 5682 | logstatsplain |
-rwxr-xr-x | ia_pdf_match.py | 2920 | logstatsplain |
-rwxr-xr-x | ingest_file.py | 2818 | logstatsplain |
-rwxr-xr-x | pdfextract_tool.py | 5115 | logstatsplain |
-rwxr-xr-x | pdftrio_tool.py | 4752 | logstatsplain |
-rwxr-xr-x | persist_tool.py | 6778 | logstatsplain |
-rw-r--r-- | pytest.ini | 659 | logstatsplain |
d--------- | sandcrawler | 526 | logstatsplain |
-rwxr-xr-x | sandcrawler_worker.py | 10559 | logstatsplain |
d--------- | scripts | 813 | logstatsplain |
d--------- | tests | 519 | logstatsplain |
l--------- | title_slug_blacklist.txt -> ../scalding/src/main/resources/slug-denylist.txt | 48 | logstatsplain |