aboutsummaryrefslogtreecommitdiffstats
path: root/python/notes/openlibrary_works.md
blob: 25df527da2866a431a94f319769832b39628884c (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27

## Upstream Dumps

Open Library does monthly bulk dumps: <https://archive.org/details/ol_exports?sort=-publicdate>

Latest work dump: <https://openlibrary.org/data/ol_dump_works_latest.txt.gz>

TSV columns:

    type - type of record (/type/edition, /type/work etc.)
    key - unique key of the record. (/books/OL1M etc.)
    revision - revision number of the record
    last_modified - last modified timestamp
    JSON - the complete record in JSON format

    zcat ol_dump_works_latest.txt.gz | cut -f5 | head | jq .

We are going to want:

- title (with "prefix"?)
- authors
- subtitle
- year
- identifier (work? edition?)
- isbn-13 (if available)
- borrowable or not