diff options
-rwxr-xr-x | arabesque.py | 7 |
1 files changed, 4 insertions, 3 deletions
diff --git a/arabesque.py b/arabesque.py index 8dbc0ca..55e6223 100755 --- a/arabesque.py +++ b/arabesque.py @@ -11,8 +11,8 @@ Commands/modes: - backward <input.log> <input-map.sqlite> <output.sqlite> - forward <input.seed_identifiers> <output.sqlite> - everything <input.log> <input.cdx> <input.seed_identifiers> <output.sqlite> -- postprocess -- dump_json +- postprocess <sha1_status.tsv> <output.sqlite> +- dump_json <output.sqlite> Design docs in DESIGN.md @@ -21,8 +21,9 @@ Software under the GPLv3 license (a copy of which should be included with this file). TODO: +- pass SHA-1 and timestamp in forward mode (?) +- include final_size (if possible from crawl log) - open map in read-only when appropriate -- some kind of stats dump command? (querying sqlite) - should referrer map be UNIQ? - forward outputs get generated multiple times? - try: https://pypi.org/project/urlcanon/ |