blob: 58a463f8211170199cba2fff1bca30f9fb0d3ed7 (
plain)
1
2
3
4
5
6
7
|
ingest crawler:
- SPNv2 only
- remove most SPNv1/v2 path selection
- landing page + fulltext hops only (short recursion depth)
- use wayback client library instead of requests to fetch content
- https://pypi.org/project/ratelimit/
|