diff options
author | Bryan Newbold <bnewbold@archive.org> | 2019-04-24 02:13:00 +0000 |
---|---|---|
committer | Bryan Newbold <bnewbold@archive.org> | 2019-04-24 02:13:06 +0000 |
commit | d6457355b5241d32333718ba7aca316976695019 (patch) | |
tree | ff5470006439a532f0d0b0209734a32df2903db6 /arabesque.py | |
parent | 71ed3d20c6898df32a31c9b1ecc843e56c976e9d (diff) | |
download | arabesque-d6457355b5241d32333718ba7aca316976695019.tar.gz arabesque-d6457355b5241d32333718ba7aca316976695019.zip |
small doc/TODO notes
Diffstat (limited to 'arabesque.py')
-rwxr-xr-x | arabesque.py | 7 |
1 files changed, 4 insertions, 3 deletions
diff --git a/arabesque.py b/arabesque.py index 8dbc0ca..55e6223 100755 --- a/arabesque.py +++ b/arabesque.py @@ -11,8 +11,8 @@ Commands/modes: - backward <input.log> <input-map.sqlite> <output.sqlite> - forward <input.seed_identifiers> <output.sqlite> - everything <input.log> <input.cdx> <input.seed_identifiers> <output.sqlite> -- postprocess -- dump_json +- postprocess <sha1_status.tsv> <output.sqlite> +- dump_json <output.sqlite> Design docs in DESIGN.md @@ -21,8 +21,9 @@ Software under the GPLv3 license (a copy of which should be included with this file). TODO: +- pass SHA-1 and timestamp in forward mode (?) +- include final_size (if possible from crawl log) - open map in read-only when appropriate -- some kind of stats dump command? (querying sqlite) - should referrer map be UNIQ? - forward outputs get generated multiple times? - try: https://pypi.org/project/urlcanon/ |