aboutsummaryrefslogtreecommitdiffstats
path: root/extra/bulk_edits/2022-04-07_initial_datasets.md
blob: 90827a388aa086d5699eac9b61e8184e0498513d (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

Importing fileset and file entities from initial sandcrawler ingests.

Git commit: `ede98644a89afd15d903061e0998dbd08851df6d`

Filesets:

    export FATCAT_AUTH_SANDCRAWLER=[...]
    cat /tmp/ingest_dataset_combined_results.2022-04-04.partial.json \
        | ./fatcat_import.py ingest-fileset-results -
    # editgroup_5l47i7bscvfmpf4ddytauoekea
    # Counter({'total': 195, 'skip': 176, 'skip-hit': 160, 'insert': 19, 'skip-single-file': 14, 'skip-partial-file-info': 2, 'update': 0, 'exists': 0})

    cat /srv/fatcat/datasets/ingest_dataset_combined_results.2022-04-04.partial.json \
        | ./fatcat_import.py ingest-fileset-file-results -
    # editgroup_i2k2ucon7nap3gui3z7amuiug4
    # Counter({'total': 195, 'skip': 184, 'skip-hit': 160, 'skip-status': 24, 'insert': 11, 'update': 0, 'exists': 0})

Tried running again, to ensure that there are not duplicate inserts, and that
worked ('exists' instead of 'insert' counts).

Finally!