From fc4ca558e329d878a430dd5241bf0195e0998f10 Mon Sep 17 00:00:00 2001 From: Bryan Newbold Date: Thu, 28 Feb 2019 12:32:28 -0800 Subject: update examples; create README --- DESIGN.md | 327 +++++++++++++++++++++++++++++ README.md | 354 ++++--------------------------- arabesque.py | 2 +- examples/report_template.md | 108 ++++++++++ examples/seed_doi.tsv | 494 ++++++++++++++++++++++++++++++++++++++++++++ report_template.md | 108 ---------- test.sqlite | Bin 57344 -> 0 bytes test.tsv | 10 - 8 files changed, 972 insertions(+), 431 deletions(-) create mode 100644 DESIGN.md create mode 100644 examples/report_template.md create mode 100644 examples/seed_doi.tsv delete mode 100644 report_template.md delete mode 100644 test.sqlite delete mode 100644 test.tsv diff --git a/DESIGN.md b/DESIGN.md new file mode 100644 index 0000000..2ae80e8 --- /dev/null +++ b/DESIGN.md @@ -0,0 +1,327 @@ + +Going to look something like: + + zcat DOI-LANDING-CRAWL-2018-06-full_crawl_logs/DOI-LANDING-CRAWL-2018-06.$SHARD.us.archive.org.crawl.log.gz | tr -cd '[[:print:]]\n\r\t' | rg '//doi.org/' | /fast/scratch/unpaywall/make_doi_list.py > doi_list.$SHARD.txt + + zcat /fast/unpaywall-munging/DOI-LANDING-CRAWL-2018-06/DOI-LANDING-CRAWL-2018-06-full_crawl_logs/DOI-LANDING-CRAWL-2018-06.$SHARD.us.archive.org.crawl.log.gz | pv | /fast/scratch/unpaywall/make_map.py redirectmap.$SHARD.db + + cat /fast/unpaywall-munging/DOI-LANDING-CRAWL-2018-06/doi_list.$SHARD.txt | pv | /fast/scratch/unpaywall/make_output.py redirectmap.$SHARD.db > doi_index.$SHARD.tsv + +Let's start with: + + mkdir UNPAYWALL-PDF-CRAWL-2018-07 + ia download UNPAYWALL-PDF-CRAWL-2018-07-full_crawl_logs + +export SHARD=wbgrp-svc279 # running +export SHARD=wbgrp-svc280 # running +export SHARD=wbgrp-svc281 # running +export SHARD=wbgrp-svc282 # running +zcat UNPAYWALL-PDF-CRAWL-2018-07-full_crawl_logs/UNPAYWALL-PDF-CRAWL-2018-07.$SHARD.us.archive.org.crawl.log.gz | pv | /fast/scratch/unpaywall/make_map.py redirectmap.$SHARD.db +zcat UNPAYWALL-PDF-CRAWL-2018-07-full_crawl_logs/UNPAYWALL-PDF-CRAWL-2018-07-PATCH.$SHARD.us.archive.org.crawl.log.gz | pv | /fast/scratch/unpaywall/make_map.py redirectmap.$SHARD-PATCH.db + +### Design + +If possible, we'd like something that will work with as many crawls as +possible. Want to work with shards, then merge outputs. + +Output: JSON and/or sqlite rows with: + +- identifier (optional?) +- initial-uri (indexed) +- breadcrumbs +- final-uri +- final-http-status +- final-sha1 +- final-mimetype-normalized +- final-was-dedupe (boolean) +- final-cdx (string, if would be extracted) + +This will allow filtering on various fields, checking success stats, etc. + +Components: + +- {identifier, initial-uri} input (basically, seedlist) +- full crawl logs +- raw CDX, indexed by final-uri +- referer map + +Process: + +- use full crawl logs to generate a referer map; this is a dict with keys as + URI, and value as {referer URI, status, breadcrumb, was-dedupe, mimetype}; + the referer may be null. database can be whatever. +- iterate through CDX, filtering by HTTP status and mimetype (including + revists). for each potential, lookup in referer map. if mimetype is + confirmed, then iterate through full referer chain, and print a final line + which is all-but-identifier +- iterate through identifier/URI list, inserting identifier columns + +Complications: + +- non-PDF terminals: error codes, or HTML only (failed to find PDF) +- multiple terminals per seed; eg, multiple PDFs, or PDF+postscript+HTML or + whatever + +Process #2: + +- use full crawl logs to generate a bi-directional referer map: sqlite3 table + with uri, referer-uri both indexed. also {status, breadcrumb, was-dedupe, + mimetype} rows +- iterate through CDX, selecting successful "terminal" lines (mime, status). + use referer map to iterate back to an initial URI, and generate a row. lookup + output table by initial-uri; if an entry already exists, behavior is + flag-dependent: overwrite if "better", or add a second line +- in a second pass, update rows with identifier based on URI. if rows not + found/updated, then do a "forwards" lookup to a terminal condition, and write + that status. note that these rows won't have CDX. + +More Complications: + +- handling revisits correctly... raw CDX probably not actually helpful for PDF + case, only landing/HTML case +- given above, should probably just (or as a mode) iterate over only crawl logs + in "backwards" stage +- fan-out of "forward" redirect map, in the case of embeds and PDF link + extraction +- could pull out first and final URI domains for easier SQL stats/reporting +- should include final datetime (for wayback lookups) + +NOTE/TODO: journal-level dumps of fatcat metadata would be cool... could +roll-up release dumps as an alternative to hitting elasticsearch? or just hit +elasticsearch and both dump to sqlite and enrich elastic doc? should probably +have an indexed "last updated" timestamp in all elastic docs + +### Crawl Log Notes + +Fields: + + 0 timestamp (ISO8601) of log line + 1 status code (HTTP or negative) + 2 size in bytes (content only) + 3 URI of this download + 4 discovery breadcrumbs + 5 "referer" URI + 6 mimetype (as reported?) + 7 worker thread + 8 full timestamp (start of network fetch; this is dt?) + 9 SHA1 + 10 source tag + 11 annotations + 12 partial CDX JSON + +### External Prep for, Eg, Unpaywall Crawl + + export LC_ALL=C + sort -S 8G -u seedlist.shard > seedlist.shard.sorted + + zcat unpaywall_20180621.pdf_meta.tsv.gz | awk '{print $2 "\t" $1}' | sort -S 8G -u > unpaywall_20180621.seed_id.tsv + + join -t $'\t' unpaywall_20180621.seed_id.tsv unpaywall_crawl_patch_seedlist.split_3.schedule.sorted > seed_id.shard.tsv + +TODO: why don't these sum/match correctly? + + bnewbold@orithena$ wc -l seed_id.shard.tsv unpaywall_crawl_patch_seedlist.split_3.schedule.sorted + 880737 seed_id.shard.tsv + 929459 unpaywall_crawl_patch_seedlist.split_3.schedule.sorted + + why is: + http://00ec89c.netsolhost.com/brochures/200605_JAWMA_Hg_Paper_Lee_Hastings.pdf + in unpaywall_crawl_patch_seedlist, but not unpaywall_20180621.pdf_meta? + + # Can't even filter on HTTP 200, because revisits are '-' + #zcat UNPAYWALL-PDF-CRAWL-2018-07.cdx.gz | rg 'wbgrp-svc282' | rg ' 200 ' | rg '(pdf)|(revisit)' > UNPAYWALL-PDF-CRAWL-2018-07.svc282.filtered.cdx + + zcat UNPAYWALL-PDF-CRAWL-2018-07.cdx.gz | rg 'UNPAYWALL-PDF-CRAWL-2018-07-PATCH' | rg 'wbgrp-svc282' | rg '(pdf)|( warc/revisit )|(postscript)|( unk )' > UNPAYWALL-PDF-CRAWL-2018-07-PATCH.svc282.filtered.cdx + +TODO: spaces in URLs, like 'https://www.termedia.pl/Journal/-7/pdf-27330-10?filename=A case.pdf' + +### Revisit Notes + +Neither CDX nor crawl logs seem to have revisits actually point to final +content, they just point to the revisit record in the (crawl-local) WARC. + +### sqlite3 stats + + select count(*) from crawl_result; + + select count(*) from crawl_result where identifier is null; + + select breadcrumbs, count(*) from crawl_result group by breadcrumbs; + + select final_was_dedupe, count(*) from crawl_result group by final_was_dedupe; + + select final_http_status, count(*) from crawl_result group by final_http_status; + + select final_mimetype, count(*) from crawl_result group by final_mimetype; + + select * from crawl_result where final_mimetype = 'text/html' and final_http_status = '200' order by random() limit 5; + + select count(*) from crawl_result where final_uri like 'https://academic.oup.com/Govern%'; + + select count(distinct identifier) from crawl_result where final_sha1 is not null; + +### testing shard notes + +880737 `seed_id` lines +21776 breadcrumbs are null (no crawl logs line); mostly normalized URLs? +24985 "first" URIs with no identifier; mostly normalized URLs? + +backward: Counter({'skip-cdx-scope': 807248, 'inserted': 370309, 'skip-map-scope': 2913}) +forward (dirty): Counter({'inserted': 509242, 'existing-id-updated': 347218, 'map-uri-missing': 15556, 'existing-complete': 8721, '_normalized-seed-uri': 5520}) + +874131 identifier is not null +881551 breadcrumbs is not null +376057 final_mimetype is application/pdf +370309 final_sha1 is not null +332931 application/pdf in UNPAYWALL-PDF-CRAWL-2018-07-PATCH.svc282.filtered.cdx + +summary: + 370309/874131 42% got a PDF + 264331/874131 30% some domain dead-end + 196747/874131 23% onlinelibrary.wiley.com + 33879/874131 4% www.nature.com + 11074/874131 1% www.tandfonline.com + 125883/874131 14% blocked, 404, other crawl failures + select count(*) from crawl_result where final_http_status >= '400' or final_http_status < '200'; + 121028/874131 14% HTTP 200, but not pdf + 105317/874131 12% academic.oup.com; all rate-limited or cookie fail + 15596/874131 1.7% didn't even try crawling (null final status) + +TODO: +- add "success" flag (instead of "final_sha1 is null") +- + + http://oriental-world.org.ua/sites/default/files/Archive/2017/3/4.pdf 10.15407/orientw2017.03.021 - http://oriental-world.org.ua/sites/default/files/Archive/2017/3/4.pdf 403 ¤ application/pdf 0 ¤ + +Iterated: + +./arabesque.py backward UNPAYWALL-PDF-CRAWL-2018-07-PATCH.svc282.filtered.cdx map.sqlite out.sqlite +Counter({'skip-cdx-scope': 813760, 'inserted': 370435, 'skip-map-scope': 4620, 'skip-tiny-octetstream-': 30}) + +./arabesque.py forward unpaywall_20180621.seed_id.shard.tsv map.sqlite out.sqlite +Counter({'inserted': 523594, 'existing-id-updated': 350009, '_normalized-seed-uri': 21371, 'existing-complete': 6638, 'map-uri-missing': 496}) + +894029 breadcrumbs is not null +874102 identifier is not null +20423 identifier is null +496 breadcrumbs is null +370435 final_sha1 is not null + +### URL/seed non-match issues! + +Easily fixable: +- capitalization of domains +- empty port number, like `http://genesis.mi.ras.ru:/~razborov/hadamard.ps` + +Encodable: +- URL encoding + http://accounting.rutgers.edu/docs/seminars/Fall11/Clawbacks_9-27-11[1].pdf + http://accounting.rutgers.edu/docs/seminars/Fall11/Clawbacks_9-27-11%5B1%5D.pdf +- whitespace in URL (should be url-encoded) + https://www.termedia.pl/Journal/-7/pdf-27330-10?filename=A case.pdf + https://www.termedia.pl/Journal/-7/pdf-27330-10?filename=A%EF%BF%BD%EF%BF%BDcase.pdf +- tricky hidden unicode + http://goldhorde.ru/wp-content/uploads/2017/03/ЗО-1-2017-206-212.pdf + http://goldhorde.ru/wp-content/uploads/2017/03/%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD-1-2017-206-212.pdf + +Harder/Custom? +- paths including "/../" or "/./" are collapsed +- port number 80, like `http://fermet.misis.ru:80/jour/article/download/724/700` +- aos2.uniba.it:8080papers + +- fragments stripped by crawler: 'https://www.termedia.pl/Journal/-85/pdf-27083-10?filename=BTA#415-06-str307-316.pdf' + +### Debugging "redirect terminal" issue + +Some are redirect loops; fine. + +Some are from 'cookieSet=1' redirects, like 'http://journals.sagepub.com/doi/pdf/10.1177/105971230601400206?cookieSet=1'. This comes through like: + + sqlite> select * from crawl_result where initial_uri = 'http://adb.sagepub.com/cgi/reprint/14/2/147.pdf'; + initial_uri identifier breadcrumbs final_uri final_http_status final_sha1 final_mimetype final_was_dedupe final_cdx + http://adb.sagepub.com/cgi/reprint/14/2/147.pdf 10.1177/105971230601400206 R http://journals.sagepub.com/doi/pdf/10.1177/105971230601400206 302 ¤ text/html 0 ¤ + +Using 'http' (note: this is not an OA article): + + http://adb.sagepub.com/cgi/reprint/14/2/147.pdf + https://journals.sagepub.com/doi/pdf/10.1177/105971230601400206 + https://journals.sagepub.com/doi/pdf/10.1177/105971230601400206?cookieSet=1 + http://journals.sagepub.com/action/cookieAbsent + +Is heritrix refusing to do that second redirect? In some cases it will do at +leat the first, like: + + http://pubs.rsna.org/doi/pdf/10.1148/radiographics.11.1.1996385 + http://pubs.rsna.org/doi/pdf/10.1148/radiographics.11.1.1996385?cookieSet=1 + http://pubs.rsna.org/action/cookieAbsent + +I think the vast majority of redirect terminals are when we redirect to a page +that has already been crawled. This is a bummer because we can't find the +redirect target in the logs. + +Eg, academic.oup.com sometimes redirects to cookieSet, then cookieAbsent; other +times it redirects to Governer. It's important to distinguish between these. + +### Scratch + +What are actual advantages/use-cases of CDX mode? +=> easier CDX-to-WARC output mode +=> sending CDX along with WARCs as an index + +Interested in scale-up behavior: full unpaywall PDF crawls, and/or full DOI landing crawls +=> eatmydata +dentifier is not null + + + zcat UNPAYWALL-PDF-CRAWL-2018-07-PATCH* | time /fast/scratch/unpaywall/arabesque.py referrer - UNPAYWALL-PDF-CRAWL-2018-07-PATCH.map.sqlite + [snip] + ... referrer 5542000 + Referrer map complete. + 317.87user 274.57system 21:20.22elapsed 46%CPU (0avgtext+0avgdata 22992maxresident)k + 24inputs+155168464outputs (0major+802114minor)pagefaults 0swaps + + bnewbold@ia601101$ ls -lathr + -rw-r--r-- 1 bnewbold bnewbold 1.7G Dec 12 12:33 UNPAYWALL-PDF-CRAWL-2018-07-PATCH.map.sqlite + +Scaling! + + 16,736,800 UNPAYWALL-PDF-CRAWL-2018-07.wbgrp-svc282.us.archive.org.crawl.log + 17,215,895 unpaywall_20180621.seed_id.tsv + +Oops; need to shard the seed_id file. + +Ugh, this one is a little derp because I didn't sort correctly. Let's say close enough though... + + 4318674 unpaywall_crawl_seedlist.svc282.tsv + 3901403 UNPAYWALL-PDF-CRAWL-2018-07.wbgrp-svc282.seed_id.tsv + + +/fast/scratch/unpaywall/arabesque.py everything CORE-UPSTREAM-CRAWL-2018-11.combined.log core_2018-03-01_metadata.seed_id.tsv CORE-UPSTREAM-CRAWL-2018-11.out.sqlite + + Counter({'inserted': 3226191, 'skip-log-scope': 2811395, 'skip-log-prereq': 108932, 'skip-tiny-octetstream-': 855, 'skip-map-scope': 2}) + Counter({'existing-id-updated': 3221984, 'inserted': 809994, 'existing-complete': 228909, '_normalized-seed-uri': 17287, 'map-uri-missing': 2511, '_redirect-recursion-limit': 221, 'skip-bad-seed-uri': 17}) + +time /fast/scratch/unpaywall/arabesque.py everything UNPAYWALL-PDF-CRAWL-2018-07.wbgrp-svc282.us.archive.org.crawl.log UNPAYWALL-PDF-CRAWL-2018-07.wbgrp-svc282.seed_id.tsv UNPAYWALL-PDF-CRAWL-2018-07.out.sqlite + + Everything complete! + Counter({'skip-log-scope': 13476816, 'inserted': 2536452, 'skip-log-prereq': 682460, 'skip-tiny-octetstream-': 41067}) + Counter({'existing-id-updated': 1652463, 'map-uri-missing': 1245789, 'inserted': 608802, 'existing-complete': 394349, '_normalized-seed-uri': 22573, '_redirect-recursion-limit': 157}) + + real 63m42.124s + user 53m31.007s + sys 6m50.535s + +### Performance + +Before tweaks: + + real 2m55.975s + user 2m6.772s + sys 0m12.684s + +After: + + real 1m51.500s + user 1m44.600s + sys 0m3.496s + diff --git a/README.md b/README.md index 2ae80e8..3673eaf 100644 --- a/README.md +++ b/README.md @@ -1,327 +1,57 @@ -Going to look something like: - zcat DOI-LANDING-CRAWL-2018-06-full_crawl_logs/DOI-LANDING-CRAWL-2018-06.$SHARD.us.archive.org.crawl.log.gz | tr -cd '[[:print:]]\n\r\t' | rg '//doi.org/' | /fast/scratch/unpaywall/make_doi_list.py > doi_list.$SHARD.txt + _ + | | + __, ,_ __, | | _ , __, _ + / | / | / | |/ \_|/ / \_/ | | | |/ + \_/|_/ |_/\_/|_/\_/ |__/ \/ \_/|_/ \_/|_/|__/ + |\ + |/ - zcat /fast/unpaywall-munging/DOI-LANDING-CRAWL-2018-06/DOI-LANDING-CRAWL-2018-06-full_crawl_logs/DOI-LANDING-CRAWL-2018-06.$SHARD.us.archive.org.crawl.log.gz | pv | /fast/scratch/unpaywall/make_map.py redirectmap.$SHARD.db - cat /fast/unpaywall-munging/DOI-LANDING-CRAWL-2018-06/doi_list.$SHARD.txt | pv | /fast/scratch/unpaywall/make_output.py redirectmap.$SHARD.db > doi_index.$SHARD.tsv +A simple python3 script to summarize Heritrix3 web crawl logs for a particular +style of crawl: fetching large numbers of files associated with a persistent +identifier. For example, crawling tens of millions of Open Access PDFs (via +direct link or landing page URL) associated with a DOI. -Let's start with: +Output is a (large) sqlite3 database file. Combine with `sqlite-notebook` to +generate HTML reports: - mkdir UNPAYWALL-PDF-CRAWL-2018-07 - ia download UNPAYWALL-PDF-CRAWL-2018-07-full_crawl_logs + https://github.com/bnewbold/sqlite-notebook -export SHARD=wbgrp-svc279 # running -export SHARD=wbgrp-svc280 # running -export SHARD=wbgrp-svc281 # running -export SHARD=wbgrp-svc282 # running -zcat UNPAYWALL-PDF-CRAWL-2018-07-full_crawl_logs/UNPAYWALL-PDF-CRAWL-2018-07.$SHARD.us.archive.org.crawl.log.gz | pv | /fast/scratch/unpaywall/make_map.py redirectmap.$SHARD.db -zcat UNPAYWALL-PDF-CRAWL-2018-07-full_crawl_logs/UNPAYWALL-PDF-CRAWL-2018-07-PATCH.$SHARD.us.archive.org.crawl.log.gz | pv | /fast/scratch/unpaywall/make_map.py redirectmap.$SHARD-PATCH.db +The simplest usage is to specify a seed-url/identifier mapping, a crawl log, +and an output database file name: -### Design + ./arabesque.py everything examples/crawl.log examples/seed_doi.tsv output.sqlite3 -If possible, we'd like something that will work with as many crawls as -possible. Want to work with shards, then merge outputs. +Then generate an HTML report: -Output: JSON and/or sqlite rows with: + sqlite-notebook.py examples/report_template.md output.sqlite3 > report.html -- identifier (optional?) -- initial-uri (indexed) -- breadcrumbs -- final-uri -- final-http-status -- final-sha1 -- final-mimetype-normalized -- final-was-dedupe (boolean) -- final-cdx (string, if would be extracted) +The core feature of this script to is resolve HTTP redirect chains. In the +"backward" mode, all terminal responses (HTTP 200) that are in-scope (by +mimetype) are resolved back to their original seed URL. There may be multiple +in-scope terminal responses per seed (eg, via embeds or other URL extraction +beans). In the "forward" mode, redirects are resolved to a single terminal +response (if there is one), which may be 4xx, 5xx, or other failure response +code. -This will allow filtering on various fields, checking success stats, etc. +The result is a single summary table with the following SQL schema: -Components: + CREATE TABLE IF NOT EXISTS crawl_result + (initial_url text NOT NULL, + identifier text, + initial_domain text, + breadcrumbs text, + final_url text, + final_domain text text, + final_timestamp text, + final_status_code text, + final_sha1 text, + final_mimetype text, + final_was_dedupe bool, + hit bool); -- {identifier, initial-uri} input (basically, seedlist) -- full crawl logs -- raw CDX, indexed by final-uri -- referer map - -Process: - -- use full crawl logs to generate a referer map; this is a dict with keys as - URI, and value as {referer URI, status, breadcrumb, was-dedupe, mimetype}; - the referer may be null. database can be whatever. -- iterate through CDX, filtering by HTTP status and mimetype (including - revists). for each potential, lookup in referer map. if mimetype is - confirmed, then iterate through full referer chain, and print a final line - which is all-but-identifier -- iterate through identifier/URI list, inserting identifier columns - -Complications: - -- non-PDF terminals: error codes, or HTML only (failed to find PDF) -- multiple terminals per seed; eg, multiple PDFs, or PDF+postscript+HTML or - whatever - -Process #2: - -- use full crawl logs to generate a bi-directional referer map: sqlite3 table - with uri, referer-uri both indexed. also {status, breadcrumb, was-dedupe, - mimetype} rows -- iterate through CDX, selecting successful "terminal" lines (mime, status). - use referer map to iterate back to an initial URI, and generate a row. lookup - output table by initial-uri; if an entry already exists, behavior is - flag-dependent: overwrite if "better", or add a second line -- in a second pass, update rows with identifier based on URI. if rows not - found/updated, then do a "forwards" lookup to a terminal condition, and write - that status. note that these rows won't have CDX. - -More Complications: - -- handling revisits correctly... raw CDX probably not actually helpful for PDF - case, only landing/HTML case -- given above, should probably just (or as a mode) iterate over only crawl logs - in "backwards" stage -- fan-out of "forward" redirect map, in the case of embeds and PDF link - extraction -- could pull out first and final URI domains for easier SQL stats/reporting -- should include final datetime (for wayback lookups) - -NOTE/TODO: journal-level dumps of fatcat metadata would be cool... could -roll-up release dumps as an alternative to hitting elasticsearch? or just hit -elasticsearch and both dump to sqlite and enrich elastic doc? should probably -have an indexed "last updated" timestamp in all elastic docs - -### Crawl Log Notes - -Fields: - - 0 timestamp (ISO8601) of log line - 1 status code (HTTP or negative) - 2 size in bytes (content only) - 3 URI of this download - 4 discovery breadcrumbs - 5 "referer" URI - 6 mimetype (as reported?) - 7 worker thread - 8 full timestamp (start of network fetch; this is dt?) - 9 SHA1 - 10 source tag - 11 annotations - 12 partial CDX JSON - -### External Prep for, Eg, Unpaywall Crawl - - export LC_ALL=C - sort -S 8G -u seedlist.shard > seedlist.shard.sorted - - zcat unpaywall_20180621.pdf_meta.tsv.gz | awk '{print $2 "\t" $1}' | sort -S 8G -u > unpaywall_20180621.seed_id.tsv - - join -t $'\t' unpaywall_20180621.seed_id.tsv unpaywall_crawl_patch_seedlist.split_3.schedule.sorted > seed_id.shard.tsv - -TODO: why don't these sum/match correctly? - - bnewbold@orithena$ wc -l seed_id.shard.tsv unpaywall_crawl_patch_seedlist.split_3.schedule.sorted - 880737 seed_id.shard.tsv - 929459 unpaywall_crawl_patch_seedlist.split_3.schedule.sorted - - why is: - http://00ec89c.netsolhost.com/brochures/200605_JAWMA_Hg_Paper_Lee_Hastings.pdf - in unpaywall_crawl_patch_seedlist, but not unpaywall_20180621.pdf_meta? - - # Can't even filter on HTTP 200, because revisits are '-' - #zcat UNPAYWALL-PDF-CRAWL-2018-07.cdx.gz | rg 'wbgrp-svc282' | rg ' 200 ' | rg '(pdf)|(revisit)' > UNPAYWALL-PDF-CRAWL-2018-07.svc282.filtered.cdx - - zcat UNPAYWALL-PDF-CRAWL-2018-07.cdx.gz | rg 'UNPAYWALL-PDF-CRAWL-2018-07-PATCH' | rg 'wbgrp-svc282' | rg '(pdf)|( warc/revisit )|(postscript)|( unk )' > UNPAYWALL-PDF-CRAWL-2018-07-PATCH.svc282.filtered.cdx - -TODO: spaces in URLs, like 'https://www.termedia.pl/Journal/-7/pdf-27330-10?filename=A case.pdf' - -### Revisit Notes - -Neither CDX nor crawl logs seem to have revisits actually point to final -content, they just point to the revisit record in the (crawl-local) WARC. - -### sqlite3 stats - - select count(*) from crawl_result; - - select count(*) from crawl_result where identifier is null; - - select breadcrumbs, count(*) from crawl_result group by breadcrumbs; - - select final_was_dedupe, count(*) from crawl_result group by final_was_dedupe; - - select final_http_status, count(*) from crawl_result group by final_http_status; - - select final_mimetype, count(*) from crawl_result group by final_mimetype; - - select * from crawl_result where final_mimetype = 'text/html' and final_http_status = '200' order by random() limit 5; - - select count(*) from crawl_result where final_uri like 'https://academic.oup.com/Govern%'; - - select count(distinct identifier) from crawl_result where final_sha1 is not null; - -### testing shard notes - -880737 `seed_id` lines -21776 breadcrumbs are null (no crawl logs line); mostly normalized URLs? -24985 "first" URIs with no identifier; mostly normalized URLs? - -backward: Counter({'skip-cdx-scope': 807248, 'inserted': 370309, 'skip-map-scope': 2913}) -forward (dirty): Counter({'inserted': 509242, 'existing-id-updated': 347218, 'map-uri-missing': 15556, 'existing-complete': 8721, '_normalized-seed-uri': 5520}) - -874131 identifier is not null -881551 breadcrumbs is not null -376057 final_mimetype is application/pdf -370309 final_sha1 is not null -332931 application/pdf in UNPAYWALL-PDF-CRAWL-2018-07-PATCH.svc282.filtered.cdx - -summary: - 370309/874131 42% got a PDF - 264331/874131 30% some domain dead-end - 196747/874131 23% onlinelibrary.wiley.com - 33879/874131 4% www.nature.com - 11074/874131 1% www.tandfonline.com - 125883/874131 14% blocked, 404, other crawl failures - select count(*) from crawl_result where final_http_status >= '400' or final_http_status < '200'; - 121028/874131 14% HTTP 200, but not pdf - 105317/874131 12% academic.oup.com; all rate-limited or cookie fail - 15596/874131 1.7% didn't even try crawling (null final status) - -TODO: -- add "success" flag (instead of "final_sha1 is null") -- - - http://oriental-world.org.ua/sites/default/files/Archive/2017/3/4.pdf 10.15407/orientw2017.03.021 - http://oriental-world.org.ua/sites/default/files/Archive/2017/3/4.pdf 403 ¤ application/pdf 0 ¤ - -Iterated: - -./arabesque.py backward UNPAYWALL-PDF-CRAWL-2018-07-PATCH.svc282.filtered.cdx map.sqlite out.sqlite -Counter({'skip-cdx-scope': 813760, 'inserted': 370435, 'skip-map-scope': 4620, 'skip-tiny-octetstream-': 30}) - -./arabesque.py forward unpaywall_20180621.seed_id.shard.tsv map.sqlite out.sqlite -Counter({'inserted': 523594, 'existing-id-updated': 350009, '_normalized-seed-uri': 21371, 'existing-complete': 6638, 'map-uri-missing': 496}) - -894029 breadcrumbs is not null -874102 identifier is not null -20423 identifier is null -496 breadcrumbs is null -370435 final_sha1 is not null - -### URL/seed non-match issues! - -Easily fixable: -- capitalization of domains -- empty port number, like `http://genesis.mi.ras.ru:/~razborov/hadamard.ps` - -Encodable: -- URL encoding - http://accounting.rutgers.edu/docs/seminars/Fall11/Clawbacks_9-27-11[1].pdf - http://accounting.rutgers.edu/docs/seminars/Fall11/Clawbacks_9-27-11%5B1%5D.pdf -- whitespace in URL (should be url-encoded) - https://www.termedia.pl/Journal/-7/pdf-27330-10?filename=A case.pdf - https://www.termedia.pl/Journal/-7/pdf-27330-10?filename=A%EF%BF%BD%EF%BF%BDcase.pdf -- tricky hidden unicode - http://goldhorde.ru/wp-content/uploads/2017/03/ЗО-1-2017-206-212.pdf - http://goldhorde.ru/wp-content/uploads/2017/03/%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD-1-2017-206-212.pdf - -Harder/Custom? -- paths including "/../" or "/./" are collapsed -- port number 80, like `http://fermet.misis.ru:80/jour/article/download/724/700` -- aos2.uniba.it:8080papers - -- fragments stripped by crawler: 'https://www.termedia.pl/Journal/-85/pdf-27083-10?filename=BTA#415-06-str307-316.pdf' - -### Debugging "redirect terminal" issue - -Some are redirect loops; fine. - -Some are from 'cookieSet=1' redirects, like 'http://journals.sagepub.com/doi/pdf/10.1177/105971230601400206?cookieSet=1'. This comes through like: - - sqlite> select * from crawl_result where initial_uri = 'http://adb.sagepub.com/cgi/reprint/14/2/147.pdf'; - initial_uri identifier breadcrumbs final_uri final_http_status final_sha1 final_mimetype final_was_dedupe final_cdx - http://adb.sagepub.com/cgi/reprint/14/2/147.pdf 10.1177/105971230601400206 R http://journals.sagepub.com/doi/pdf/10.1177/105971230601400206 302 ¤ text/html 0 ¤ - -Using 'http' (note: this is not an OA article): - - http://adb.sagepub.com/cgi/reprint/14/2/147.pdf - https://journals.sagepub.com/doi/pdf/10.1177/105971230601400206 - https://journals.sagepub.com/doi/pdf/10.1177/105971230601400206?cookieSet=1 - http://journals.sagepub.com/action/cookieAbsent - -Is heritrix refusing to do that second redirect? In some cases it will do at -leat the first, like: - - http://pubs.rsna.org/doi/pdf/10.1148/radiographics.11.1.1996385 - http://pubs.rsna.org/doi/pdf/10.1148/radiographics.11.1.1996385?cookieSet=1 - http://pubs.rsna.org/action/cookieAbsent - -I think the vast majority of redirect terminals are when we redirect to a page -that has already been crawled. This is a bummer because we can't find the -redirect target in the logs. - -Eg, academic.oup.com sometimes redirects to cookieSet, then cookieAbsent; other -times it redirects to Governer. It's important to distinguish between these. - -### Scratch - -What are actual advantages/use-cases of CDX mode? -=> easier CDX-to-WARC output mode -=> sending CDX along with WARCs as an index - -Interested in scale-up behavior: full unpaywall PDF crawls, and/or full DOI landing crawls -=> eatmydata -dentifier is not null - - - zcat UNPAYWALL-PDF-CRAWL-2018-07-PATCH* | time /fast/scratch/unpaywall/arabesque.py referrer - UNPAYWALL-PDF-CRAWL-2018-07-PATCH.map.sqlite - [snip] - ... referrer 5542000 - Referrer map complete. - 317.87user 274.57system 21:20.22elapsed 46%CPU (0avgtext+0avgdata 22992maxresident)k - 24inputs+155168464outputs (0major+802114minor)pagefaults 0swaps - - bnewbold@ia601101$ ls -lathr - -rw-r--r-- 1 bnewbold bnewbold 1.7G Dec 12 12:33 UNPAYWALL-PDF-CRAWL-2018-07-PATCH.map.sqlite - -Scaling! - - 16,736,800 UNPAYWALL-PDF-CRAWL-2018-07.wbgrp-svc282.us.archive.org.crawl.log - 17,215,895 unpaywall_20180621.seed_id.tsv - -Oops; need to shard the seed_id file. - -Ugh, this one is a little derp because I didn't sort correctly. Let's say close enough though... - - 4318674 unpaywall_crawl_seedlist.svc282.tsv - 3901403 UNPAYWALL-PDF-CRAWL-2018-07.wbgrp-svc282.seed_id.tsv - - -/fast/scratch/unpaywall/arabesque.py everything CORE-UPSTREAM-CRAWL-2018-11.combined.log core_2018-03-01_metadata.seed_id.tsv CORE-UPSTREAM-CRAWL-2018-11.out.sqlite - - Counter({'inserted': 3226191, 'skip-log-scope': 2811395, 'skip-log-prereq': 108932, 'skip-tiny-octetstream-': 855, 'skip-map-scope': 2}) - Counter({'existing-id-updated': 3221984, 'inserted': 809994, 'existing-complete': 228909, '_normalized-seed-uri': 17287, 'map-uri-missing': 2511, '_redirect-recursion-limit': 221, 'skip-bad-seed-uri': 17}) - -time /fast/scratch/unpaywall/arabesque.py everything UNPAYWALL-PDF-CRAWL-2018-07.wbgrp-svc282.us.archive.org.crawl.log UNPAYWALL-PDF-CRAWL-2018-07.wbgrp-svc282.seed_id.tsv UNPAYWALL-PDF-CRAWL-2018-07.out.sqlite - - Everything complete! - Counter({'skip-log-scope': 13476816, 'inserted': 2536452, 'skip-log-prereq': 682460, 'skip-tiny-octetstream-': 41067}) - Counter({'existing-id-updated': 1652463, 'map-uri-missing': 1245789, 'inserted': 608802, 'existing-complete': 394349, '_normalized-seed-uri': 22573, '_redirect-recursion-limit': 157}) - - real 63m42.124s - user 53m31.007s - sys 6m50.535s - -### Performance - -Before tweaks: - - real 2m55.975s - user 2m6.772s - sys 0m12.684s - -After: - - real 1m51.500s - user 1m44.600s - sys 0m3.496s +There aren't many tests, but what there is can be run with: + pytest-3 arabesque.py diff --git a/arabesque.py b/arabesque.py index a4b3e07..afb2180 100755 --- a/arabesque.py +++ b/arabesque.py @@ -12,7 +12,7 @@ Commands/modes: - forward - everything -Design docs in README_pdf_crawl.md +Design docs in DESIGN.md TODO: - open map in read-only when appropriate diff --git a/examples/report_template.md b/examples/report_template.md new file mode 100644 index 0000000..139598b --- /dev/null +++ b/examples/report_template.md @@ -0,0 +1,108 @@ + +# Crawl QA Report + +This crawl report is auto-generated from a sqlite database file, which should be available/included. + +### Seedlist Stats + +```sql +SELECT COUNT(DISTINCT identifier) as identifiers, COUNT(DISTINCT initial_url) as uris, COUNT(DISTINCT initial_domain) AS domains FROM crawl_result; +``` + +FTP seed URLs + +```sql +SELECT COUNT(*) as ftp_urls FROM crawl_result WHERE initial_url LIKE 'ftp://%'; +``` + +### Successful Hits + +```sql +SELECT COUNT(DISTINCT identifier) as identifiers, COUNT(DISTINCT initial_url) as uris, COUNT(DISTINCT final_sha1) as unique_sha1 FROM crawl_result WHERE hit=1; +``` + +De-duplication percentage (aka, fraction of hits where content had been crawled and identified previously): + +```sql +# AVG() hack! +SELECT 100. * AVG(final_was_dedupe) as percent FROM crawl_result WHERE hit=1; +``` + +Top mimetypes for successful hits (these are usually filtered to a fixed list in post-processing): + +```sql +SELECT final_mimetype, COUNT(*) FROM crawl_result WHERE hit=1 GROUP BY final_mimetype ORDER BY COUNT(*) DESC LIMIT 10; +``` + +Most popular breadcrumbs (a measure of how hard the crawler had to work): + +```sql +SELECT breadcrumbs, COUNT(*) FROM crawl_result WHERE hit=1 GROUP BY breadcrumbs ORDER BY COUNT(*) DESC LIMIT 10; +``` + +FTP vs. HTTP hits (200 is HTTP, 226 is FTP): + +```sql +SELECT final_status_code, COUNT(*) FROM crawl_result WHERE hit=1 GROUP BY final_status_code LIMIT 10; +``` + +### Domain Summary + +Top *initial* domains: + +```sql +SELECT initial_domain, COUNT(*), 100. * COUNT(*) / (SELECT COUNT(*) FROM crawl_result) as percent FROM crawl_result GROUP BY initial_domain ORDER BY count(*) DESC LIMIT 20; +``` + +Top *successful, final* domains, where hits were found: + +```sql + +SELECT initial_domain, COUNT(*), 100. * COUNT(*) / (SELECT COUNT(*) FROM crawl_result WHERE hit=1) AS percent FROM crawl_result WHERE hit=1 GROUP BY initial_domain ORDER BY COUNT(*) DESC LIMIT 20; +``` + +Top *non-successful, final* domains where crawl paths terminated before a successful hit (but crawl did run): + +```sql +SELECT final_domain, COUNT(*) FROM crawl_result WHERE hit=0 AND final_status_code IS NOT NULL GROUP BY final_domain ORDER BY count(*) DESC LIMIT 20; +``` + +Top *uncrawled, initial* domains, where the crawl didn't even attempt to run: + +```sql +SELECT initial_domain, COUNT(*) FROM crawl_result WHERE hit=0 AND final_status_code IS NULL GROUP BY initial_domain ORDER BY count(*) DESC LIMIT 20; +``` + +Top *blocked, final* domains: + +```sql +SELECT final_domain, COUNT(*) FROM crawl_result WHERE hit=0 AND (final_status_code='-61' OR final_status_code='-2') GROUP BY final_domain ORDER BY count(*) DESC LIMIT 20; +``` + +Top *rate-limited, final* domains: + +```sql +SELECT final_domain, COUNT(*) FROM crawl_result WHERE hit=0 AND final_status_code='429' GROUP BY final_domain ORDER BY count(*) DESC LIMIT 20; +``` + +### Status Summary + +Top failure status codes: + +```sql + SELECT final_status_code, COUNT(*) FROM crawl_result WHERE hit=0 GROUP BY final_status_code ORDER BY count(*) DESC LIMIT 10; +``` + +### Example Results + +A handful of random success lines: + +```sql + SELECT identifier, initial_url, breadcrumbs, final_url, final_sha1, final_mimetype FROM crawl_result WHERE hit=1 ORDER BY random() LIMIT 10; +``` + +Handful of random non-success lines: + +```sql + SELECT identifier, initial_url, breadcrumbs, final_url, final_status_code, final_mimetype FROM crawl_result WHERE hit=0 ORDER BY random() LIMIT 25; +``` diff --git a/examples/seed_doi.tsv b/examples/seed_doi.tsv new file mode 100644 index 0000000..2ca21f8 --- /dev/null +++ b/examples/seed_doi.tsv @@ -0,0 +1,494 @@ +http://140.115.82.191/old/warehouse/sigmod98.ps 10.1145/276305.276329 +http://3morduc.googlecode.com/svn-history/r261/trunk/doc/simulatore_privitera.pdf 10.1145/1180495.1180544 +http://academypublisher.net/jcm/vol04/no02/jcm0402119125.pdf 10.4304/jcm.4.2.119-125 +http://academypublisher.net/jcm/vol04/no04/jcm0404257266.pdf 10.4304/jcm.4.4.257-266 +http://academypublisher.net/jnw/vol04/no06/jnw0406436444.pdf 10.4304/jnw.4.6.436-444 +http://advan.physiology.org/content/ajpadvan/27/3/156.1.full.pdf 10.1152/advan.00015.2003 +http://advan.physiology.org/content/ajpadvan/36/4/336.full.pdf 10.1152/advan.00050.2012 +http://ajpcell.physiology.org/content/ajpcell/289/1/C230.full.pdf 10.1152/ajpcell.00069.2005 +http://ajpcell.physiology.org/content/ajpcell/292/5/C1874.full.pdf 10.1152/ajpcell.00617.2006 +http://ajpcell.physiology.org/content/ajpcell/294/4/C879.full.pdf 10.1152/ajpcell.00490.2007 +http://ajpcell.physiology.org/content/ajpcell/294/6/C1419.full.pdf 10.1152/ajpcell.00413.2007 +http://ajpcell.physiology.org/content/ajpcell/303/9/C924.full.pdf 10.1152/ajpcell.00459.2011 +http://ajpcell.physiology.org/content/ajpcell/305/6/C601.full.pdf 10.1152/ajpcell.00042.2013 +http://ajpcell.physiology.org/content/ajpcell/306/12/C1108.full.pdf 10.1152/ajpcell.00205.2013 +http://ajpcell.physiology.org/content/ajpcell/311/5/C805.full.pdf 10.1152/ajpcell.00279.2016 +http://ajpgi.physiology.org/content/ajpgi/285/6/G1091.full.pdf 10.1152/ajpgi.00193.2003 +http://ajpgi.physiology.org/content/ajpgi/287/6/G1238.full.pdf 10.1152/ajpgi.00471.2003 +http://ajpgi.physiology.org/content/ajpgi/292/3/G706.full.pdf 10.1152/ajpgi.00347.2006 +http://ajpgi.physiology.org/content/ajpgi/293/2/G510.full.pdf 10.1152/ajpgi.00102.2007 +http://ajpgi.physiology.org/content/ajpgi/294/1/G184.full.pdf 10.1152/ajpgi.00348.2007 +http://ajpgi.physiology.org/content/ajpgi/294/3/G728.full.pdf 10.1152/ajpgi.00002.2007 +http://ajpgi.physiology.org/content/ajpgi/296/3/G651.full.pdf 10.1152/ajpgi.90387.2008 +http://ajpgi.physiology.org/content/ajpgi/297/6/G1198.full.pdf 10.1152/ajpgi.00168.2009 +http://ajpgi.physiology.org/content/ajpgi/300/4/G637.full.pdf 10.1152/ajpgi.00381.2010 +http://ajpgi.physiology.org/content/ajpgi/304/10/G897.full.pdf 10.1152/ajpgi.00160.2012 +http://ajpgi.physiology.org/content/ajpgi/305/12/G881.full.pdf 10.1152/ajpgi.00289.2013 +http://ajpgi.physiology.org/content/ajpgi/307/2/G229.full.pdf 10.1152/ajpgi.00424.2013 +http://ajpgi.physiology.org/content/ajpgi/307/4/G471.full.pdf 10.1152/ajpgi.00156.2014 +http://ajpgi.physiology.org/content/ajpgi/307/7/G732.full.pdf 10.1152/ajpgi.00073.2014 +http://ajpheart.physiology.org/content/ajpheart/283/1/H53.full.pdf 10.1152/ajpheart.01057.2001 +http://ajpheart.physiology.org/content/ajpheart/286/3/H1208.full.pdf 10.1152/ajpheart.00011.2003 +http://ajpheart.physiology.org/content/ajpheart/295/2/H874.full.pdf 10.1152/ajpheart.01189.2007 +http://ajpheart.physiology.org/content/ajpheart/295/3/H1044.full.pdf 10.1152/ajpheart.00516.2008 +http://ajpheart.physiology.org/content/ajpheart/296/2/H380.full.pdf 10.1152/ajpheart.00225.2008 +http://ajpheart.physiology.org/content/ajpheart/298/6/H1857.full.pdf 10.1152/ajpheart.00754.2009 +http://ajpheart.physiology.org/content/ajpheart/307/11/H1587.full.pdf 10.1152/ajpheart.00557.2014 +http://ajpheart.physiology.org/content/ajpheart/309/11/H1964.full.pdf 10.1152/ajpheart.00055.2015 +http://ajp.psychiatryonline.org/data/Journals/AJP/2636/881.pdf 10.1176/ajp.118.10.881 +http://ajp.psychiatryonline.org/data/Journals/AJP/2636/881.pdf 10.1515/9783110816228.560 +http://ajp.psychiatryonline.org/data/Journals/AJP/3387/559.pdf 10.1176/ajp.142.5.559 +http://ajp.psychiatryonline.org/data/Journals/AJP/3582/1040.pdf 10.1176/ajp.149.8.1040 +http://ajpregu.physiology.org/content/ajpregu/282/6/R1718.full.pdf 10.1152/ajpregu.00651.2001 +http://ajpregu.physiology.org/content/ajpregu/284/6/R1551.full.pdf 10.1152/ajpregu.00519.2002 +http://ajpregu.physiology.org/content/ajpregu/294/1/R52.full.pdf 10.1152/ajpregu.00635.2007 +http://ajpregu.physiology.org/content/ajpregu/294/6/R1813.full.pdf 10.1152/ajpregu.00178.2008 +http://ajpregu.physiology.org/content/ajpregu/298/3/R720.full.pdf 10.1152/ajpregu.00619.2009 +http://ajpregu.physiology.org/content/ajpregu/299/5/R1183.full.pdf 10.1152/ajpregu.00212.2010 +http://ajpregu.physiology.org/content/ajpregu/309/8/R875.full.pdf 10.1152/ajpregu.00258.2015 +http://ants.dif.um.es/staff/pedrom/papers/Ruiz-AdHocNow05.pdf 10.1007/11561354_22 +http://ants.dif.um.es/staff/pedrom/papers/Ruiz-PIMRC04.pdf 10.1109/pimrc.2004.1370936 +http://archive.dstc.edu.au/AU/staff/kerry-raymond/missing-link.pdf 10.1007/3-540-45832-8_9 +http://archive.dstc.edu.au/FastWeb/papers/iwqos99.ps.gz 10.1109/iwqos.1999.766478 +http://atvs.ii.uam.es/files/2007_SPIE_NameLegibility_Galbally.pdf 10.1117/12.719236 +http://atvs.ii.uam.es/files/2010_SMC_A_Campisi.pdf 10.1109/tsmca.2010.2041653 +http://banglajol.ubiquity.press/index.php/BJMS/article/download/18477/18190 10.3329/bjms.v15i1.18477 +http://banglajol.ubiquity.press/index.php/BJMS/article/download/27177/18241 10.3329/bjms.v15i1.27177 +http://bit.csc.lsu.edu/%7Eiyengar/images/publications/qam.pdf 10.1109/glocom.2010.5684177 +http://brainimaging.waisman.wisc.edu/~jjo/downloads/hbm09_pathlength.pdf 10.1016/s1053-8119(09)71162-0 +http://brainimaging.waisman.wisc.edu/~perlman/papers/Self/WatkinsTeasdaleAdaptiveMalSelfFocusDepression2004.pdf 10.1016/j.jad.2003.10.006 +http://brainimaging.waisman.wisc.edu/publications/2000/Six-month_test-retest.pdf 10.1002/(sici)1097-0193(200005)10:1<1::aid-hbm10>3.0.co;2-o +http://brainimaging.waisman.wisc.edu/publications/2004/Buss_ContextSpecific_Freezing_DevPsy.pdf 10.1037/0012-1649.40.4.583 +http://business.illinois.edu/aguilera/Teaching/ICC/Enriques_Volpin_2007_JOEP_CorpGovReform.pdf 10.1257/jep.21.1.117 +http://cer.sagepub.com/cgi/reprint/10/3/239.pdf 10.1177/106329302761689151 +http://cer.sagepub.com/cgi/reprint/10/4/335.pdf 10.1177/a032003 +http://cer.sagepub.com/cgi/reprint/11/3/221.pdf 10.1177/106329303038027 +http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.337.8390&rep=rep1&type=pdf 10.1109/lcomm.2012.120312.121675 +http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.650.6824&rep=rep1&type=pdf 10.1017/s0022112002001635 +http://classes.maxwell.syr.edu/ecn611/Tirole99.pdf 10.1111/1468-0262.00052 +http://computing-reports.open.ac.uk/index.php/content/download/203/1213/file/TR2006_05.pdf 10.1017/s1351324906004104 +http://darkwing.uoregon.edu/~kantor/PAPERS/spansymmetric.ps 10.1515/advg.2002.006 +http://disi.unitn.it/locigno/preprints/LoSaZso04-1.pdf 10.1109/glocom.2004.1378336 +http://dissertations.ub.rug.nl/FILES/faculties/medicine/2000/a.a.geertsema/c5.pdf 10.1016/s0142-9612(99)00102-7 +http://dissertations.ub.rug.nl/FILES/faculties/medicine/2004/f.h.w.jungbauer/c8.pdf 10.1111/j.0105-1873.2004.00422.x +http://dissertations.ub.rug.nl/FILES/faculties/medicine/2004/g.dijkstra/c4.pdf 10.1080/00365520252903099 +http://dissertations.ub.rug.nl/FILES/faculties/medicine/2009/r.de.vries/05_c5.pdf 10.1016/j.atherosclerosis.2006.12.027 +http://dissertations.ub.rug.nl/FILES/faculties/science/2001/b.w.a.van.der.strate/c1.pdf 10.1016/s0166-3542(01)00195-4 +http://download.atlantis-press.com/php/download_paper.php?id=22878 10.2991/ipemec-15.2015.171 +http://download.atlantis-press.com/php/download_paper.php?id=25860593 10.2991/sschd-16.2016.23 +http://download.atlantis-press.com/php/download_paper.php?id=25861156 10.2991/ieesasm-16.2016.35 +http://download.atlantis-press.com/php/download_paper.php?id=25874773 10.2991/emehss-17.2017.1 +http://download.atlantis-press.com/php/download_paper.php?id=25882538 10.2991/icmmcce-17.2017.158 +http://download.atlantis-press.com/php/download_paper.php?id=4295 10.2991/iccia.2012.347 +http://drona.csa.iisc.ernet.in/~shalabh/pubs/paper_final.pdf 10.1109/cdc.2007.4434595 +http://editorial.upgto.edu.mx/index.php/umr/article/download/39/65 10.18583/umr.v1i3.39 +http://ejournal.anu.edu.au/index.php/bippa/article/download/26/23/ 10.7152/bippa.v27i0.11970 +http://ejournal.anu.edu.au/index.php/bippa/article/download/682/647/ 10.7152/bippa.v29i0.9470 +http://eprints.adm.unipi.it/2370/1/TR_ABS_20170605.pdf 10.1109/tgcn.2017.2714205 +http://eprints.adm.unipi.it/694/1/Markanday.pdf 10.1016/j.eurpolymj.2009.12.024 +http://eres.bus.umich.edu/docs/workpap-dav/wp351.pdf 10.1504/ijeim.2005.006530 +http://eres.bus.umich.edu/docs/workpap-dav/wp351.pdf 10.2139/ssrn.258198 +http://eres.bus.umich.edu/docs/workpap-dav/wp373.pdf 10.18267/j.pep.175 +http://eres.bus.umich.edu/docs/workpap-dav/wp373.pdf 10.2139/ssrn.268746 +http://ftp.ccs.neu.edu/pub/people/wand/papers/wand-clinger-99.ps 10.1017/s0956796801003938 +http://ftp.ccs.neu.edu/pub/people/wand/papers/wand-clinger-99.ps 10.1109/iccl.1998.674169 +http://gromovy.net/igor/papers/mypapers/epel05.pdf 10.1002/cmr.b.20037 +http://gromovy.net/igor/papers/mypapers/gromov01.pdf 10.1006/jmre.2001.2298 +http://hal-ujm.ccsd.cnrs.fr/docs/00/11/69/96/PDF/SPIE_denis05_copyright_SPIE.pdf 10.1117/12.617405 +http://hct.ece.ubc.ca/publications/pdf/fels-hinton-1998.pdf 10.1109/72.623199 +http://hct.ece.ubc.ca/publications/pdf/fels-hinton-1998.pdf 10.1109/72.655042 +http://hct.ece.ubc.ca/publications/pdf/fels-mase-1999.pdf 10.1016/s0097-8493(99)00037-0 +http://hct.ece.ubc.ca/publications/pdf/oleinikov-etal-wacv2014.pdf 10.1109/wacv.2014.6836036 +http://hct.ece.ubc.ca/publications/pdf/vogt-chen-hoskinson-fels-SIGGRAPH2004.pdf 10.1145/1186223.1186268 +http://home.cerge-ei.cz/tkonecny/research/textjuly22.pdf 10.1787/9789264177055-6-en +http://home.ie.cuhk.edu.hk/~mhchen/papers/SHOFA.Allerton.12.pdf 10.1109/allerton.2012.6483298 +http://home.ie.cuhk.edu.hk/~mhchen/papers/SHOFA.Allerton.12.pdf 10.1109/tit.2015.2466601 +http://home.ie.cuhk.edu.hk/~mhchen/papers/sub.graph.opt.netcod.2011.pdf 10.1109/isnetcod.2011.5979092 +http://home.ie.cuhk.edu.hk/~mhchen/papers/TCP.over.ACSMA.IWQoS.11.pdf 10.1109/iwqos.2011.5931354 +http://ict.usc.edu/publications/conf2001_1.pdf 10.1109/aspaa.2001.969541 +http://ict.usc.edu/pubs/Interactants%E2%80%99+Most+Intimate+Self-Disclosure+in+Interactions+with+Virtual+Humans.pdf 10.1007/978-3-642-04380-2_66 +http://iis-db.stanford.edu/pubs/21639/chinese_suburban-dairy_production_working_paper.pdf 10.1111/j.1574-0862.2007.00220.x +http://iis-db.stanford.edu/pubs/22520/AHPPwp8.pdf 10.1007/s10754-009-9067-1 +http://iis-db.stanford.edu/pubs/22520/AHPPwp8.pdf 10.2139/ssrn.1658158 +http://intercarto.msu.ru/jour/article/download/140/142 10.24057/2414-9179-2014-1-20-632-641 +http://intercarto.msu.ru/jour/article/download/22/23 10.24057/2414-9179-2013-1-19-129-146 +http://intercarto.msu.ru/jour/article/download/515/377 10.24057/2414-9179-2017-2-23-26-33 +http://iopscience.iop.org/article/10.1086/379852/pdf 10.1086/379852 +http://iopscience.iop.org/article/10.1086/587483/pdf 10.1086/587483 +http://iopscience.iop.org/article/10.1086/589152/pdf 10.1086/589152 +http://iopscience.iop.org/article/10.1086/656322/pdf 10.1086/656322 +http://iopscience.iop.org/article/10.1088/0957-0233/27/6/069501/pdf 10.1088/0957-0233/27/6/069501 +http://iopscience.iop.org/article/10.1088/1126-6708/2002/12/040/pdf 10.1088/1126-6708/2002/12/040 +http://iopscience.iop.org/article/10.1088/1361-6463/aa977e/ampdf 10.1088/1361-6463/aa977e +http://iopscience.iop.org/article/10.1088/1742-6596/522/1/012054/pdf 10.1088/1742-6596/522/1/012054 +http://iopscience.iop.org/article/10.3847/1538-4357/aaae64/pdf 10.3847/1538-4357/aaae64 +http://jap.physiology.org/content/jap/104/5/1452.full.pdf 10.1152/japplphysiol.00021.2008 +http://jap.physiology.org/content/jap/107/5/1548.full.pdf 10.1152/japplphysiol.00622.2009 +http://jap.physiology.org/content/jap/109/4/1125.full.pdf 10.1152/japplphysiol.00316.2010 +http://jap.physiology.org/content/jap/112/9/1474.full.pdf 10.1152/japplphysiol.01477.2011 +http://jap.physiology.org/content/jap/118/12/1449.full.pdf 10.1152/japplphysiol.00269.2015 +http://jap.physiology.org/content/jap/99/2/665.full.pdf 10.1152/japplphysiol.00624.2004 +http://jn.physiology.org/content/82/2/736.full.pdf 10.1152/jn.1999.82.2.736 +http://jn.physiology.org/content/jn/104/2/1177.full.pdf 10.1152/jn.00032.2010 +http://jn.physiology.org/content/jn/112/6/1491.full.pdf 10.1152/jn.00437.2014 +http://jn.physiology.org/content/jn/116/1/88.full.pdf 10.1152/jn.00663.2015 +http://jn.physiology.org/content/jn/89/1/12.full.pdf 10.1152/jn.00416.2002 +http://jn.physiology.org/content/jn/92/4/2168.full.pdf 10.1152/jn.00103.2004 +http://jn.physiology.org/content/jn/93/1/128.full.pdf 10.1152/jn.01002.2003 +http://jn.physiology.org/content/jn/93/1/548.full.pdf 10.1152/jn.00253.2004 +http://jn.physiology.org/content/jn/94/1/852.full.pdf 10.1152/jn.00976.2004 +http://jn.physiology.org/content/jn/94/3/1781.full.pdf 10.1152/jn.01253.2004 +http://jn.physiology.org/content/jn/94/6/4471.full.pdf 10.1152/jn.00527.2005 +http://jn.physiology.org/content/jn/98/6/3185.full.pdf 10.1152/jn.00189.2007 +http://jolissrch-inter.tokai-sc.jaea.go.jp/pdfdata/AA2009-0977.pdf 10.1063/1.3294620 +http://jolissrch-inter.tokai-sc.jaea.go.jp/pdfdata/AA2013-0281.pdf 10.1504/ijnkm.2014.058927 +http://jormonline.com/index.php/jorm/article/download/183/pdf_44 10.17722/jorm.v7i2.183 +http://jormonline.com/index.php/jorm/article/download/223/pdf_61 10.17722/jorm.v8i3.223 +http://journals.sagepub.com/doi/pdf/10.1177/1087057114548832 10.1177/1087057114548832 +http://journals.sagepub.com/doi/pdf/10.1177/172460080401900202 10.1177/172460080401900202 +http://journals.sagepub.com/doi/pdf/10.3317/jraas.2007.025 10.3317/jraas.2007.025 +http://jsat.ewi.tudelft.nl/content/volume3/JSAT3_9_Sebastiani.pdf 10.1007/978-3-319-10575-8_11 +http://lamar.colostate.edu/~hrolston/Bio-Phil-Yellowstone.pdf 10.1007/bf00127491 +http://linux46.ma.utexas.edu/mp_arc/c/95/95-537.ps.gz 10.1007/978-1-4612-1246-1_14 +http://linux46.ma.utexas.edu/mp_arc/c/95/95-537.ps.gz 10.1007/bf02677977 +http://linux46.ma.utexas.edu/mp_arc/c/95/95-537.ps.gz 10.1007/s003329900029 +http://lnfp.dr18.cnrs.fr/publication_boucart/diazepam_attention.pdf 10.1037/1064-1297.15.1.115 +http://math.cofc.edu/faculty/jin/research/monad.pdf 10.2307/2275288 +http://math.cofc.edu/faculty/jin/research/nsaadd.ps 10.2307/421059 +http://mathro.fpms.ac.be/~fortemps/ftp/mofac.ps.gz 10.1007/978-3-7908-1848-2_2 +http://numerix.univ-lyon1.fr/~andro/Article/forn.ps.gz 10.1016/s0362-546x(98)00346-0 +http://numerix.univ-lyon1.fr/~clopeau/Article/clopeaud.ps.gz 10.1088/0951-7715/11/6/011 +http://onlinelibrary.wiley.com/doi/10.1111/j.1365-2125.1990.tb03828.x/pdf 10.1111/j.1365-2125.1990.tb03828.x +http://onlinelibrary.wiley.com/doi/10.1111/j.1471-0528.1997.tb14350.x/pdf 10.1111/j.1471-0528.1997.tb14350.x +http://oru.diva-portal.org/smash/get/diva2:1114157/FULLTEXT01 10.1108/tg-01-2017-0007 +http://oru.diva-portal.org/smash/get/diva2:138559/FULLTEXT01 10.1109/iros.2007.4399381 +http://pdfs.journals.lww.com/cardiovascularpharm/1989/00133/Steady_State_Kinetics_of_Ramipril_in_Renal.13.pdf?token=method|ExpireAbsolute;source|Journals;ttl|1503489426645;payload|mY8D3u1TCCsNvP5E421JYK6N6XICDamxByyYpaNzk7FKjTaa1Yz22MivkHZqjGP4kdS2v0J76WGAnHACH69s21Csk0OpQi3YbjEMdSoz2UhVybFqQxA7lKwSUlA502zQZr96TQRwhVlocEp/sJ586aVbcBFlltKNKo+tbuMfL73hiPqJliudqs17cHeLcLbV/CqjlP3IO0jGHlHQtJWcICDdAyGJMnpi6RlbEJaRheGeh5z5uvqz3FLHgPKVXJzdlt+0kYOWG0lQj/LwiLR6d1iIQWbmtDaZhZB9HWDmY6QALXYZAVOGqxlHN5vxAXyD;hash|w1DU6tcLCC69J/vIE7QDPg== 10.1097/00005344-198900133-00013 +http://pdfs.journals.lww.com/innovjournal/2016/07000/Edwards_SAPIEN_XT_in_Native_Stenotic_Mitral_Valve,.10.pdf?token=method|ExpireAbsolute;source|Journals;ttl|1506819921460;payload|mY8D3u1TCCsNvP5E421JYK6N6XICDamxByyYpaNzk7FKjTaa1Yz22MivkHZqjGP4kdS2v0J76WGAnHACH69s21Csk0OpQi3YbjEMdSoz2UhVybFqQxA7lKwSUlA502zQZr96TQRwhVlocEp/sJ586aVbcBFlltKNKo+tbuMfL73hiPqJliudqs17cHeLcLbV/CqjlP3IO0jGHlHQtJWcICDdAyGJMnpi6RlbEJaRheGeh5z5uvqz3FLHgPKVXJzdmTT46oelHR/qQ57ZOB9FcQ5Bttxm46V5ouKTSUnncMJukCBGBsOrWOFmxxHAF7bA;hash|OaSoJkPBQktbcKyUBoGIUw== 10.1097/imi.0000000000000272 +http://pdfs.journals.lww.com/jbjsjournal/2014/10150/Which_One_Could_Be_Managed___Commentary_on_an.16.pdf?token=method|ExpireAbsolute;source|Journals;ttl|1506163136053;payload|mY8D3u1TCCsNvP5E421JYK6N6XICDamxByyYpaNzk7FKjTaa1Yz22MivkHZqjGP4kdS2v0J76WGAnHACH69s21Csk0OpQi3YbjEMdSoz2UhVybFqQxA7lKwSUlA502zQZr96TQRwhVlocEp/sJ586aVbcBFlltKNKo+tbuMfL73hiPqJliudqs17cHeLcLbV/CqjlP3IO0jGHlHQtJWcICDdAyGJMnpi6RlbEJaRheGeh5z5uvqz3FLHgPKVXJzd3q8SQSgPZSBPpzHLmT+Eu2FA0aaZw4RooCJTgCBD2eBTn6KV8ImP8BAxdgWwlJ1+;hash|YWI7bSQwOS1wDMcsKXRg2w== 10.2106/jbjs.n.00751 +http://pdfs.journals.lww.com/jbjsjournal/2017/10180/Using_Clinical_Outcomes_to_Improve_Preclinical.13.pdf?token=method|ExpireAbsolute;source|Journals;ttl|1508471090239;payload|mY8D3u1TCCsNvP5E421JYK6N6XICDamxByyYpaNzk7FKjTaa1Yz22MivkHZqjGP4kdS2v0J76WGAnHACH69s21Csk0OpQi3YbjEMdSoz2UhVybFqQxA7lKwSUlA502zQZr96TQRwhVlocEp/sJ586aVbcBFlltKNKo+tbuMfL73hiPqJliudqs17cHeLcLbV/CqjlP3IO0jGHlHQtJWcICDdAyGJMnpi6RlbEJaRheGeh5z5uvqz3FLHgPKVXJzd3q8SQSgPZSBPpzHLmT+Eu2KhpjCsgf0QcrHRS/NaAEwe2UO1JrSvWyvbnhqgo1tS;hash|KVLDazjtxdnY9fstSU5jTQ== 10.2106/jbjs.e.00851 +http://pdfs.journals.lww.com/optvissci/1955/11000/TV_RETINOSCOPY__.7.pdf?token=method|ExpireAbsolute;source|Journals;ttl|1500441332101;payload|mY8D3u1TCCsNvP5E421JYK6N6XICDamxByyYpaNzk7FKjTaa1Yz22MivkHZqjGP4kdS2v0J76WGAnHACH69s21Csk0OpQi3YbjEMdSoz2UhVybFqQxA7lKwSUlA502zQZr96TQRwhVlocEp/sJ586aVbcBFlltKNKo+tbuMfL73hiPqJliudqs17cHeLcLbV/CqjlP3IO0jGHlHQtJWcICDdAyGJMnpi6RlbEJaRheGeh5z5uvqz3FLHgPKVXJzdeUvS3M/aRHfZJBMS25ybl9oOeVRYKVsy4cH0CAeB//c=;hash|bmlYTcgZimMausjQtQtFfg== 10.1097/00006324-195511000-00007 +http://pdfs.journals.lww.com/topicsinlanguagedisorders/2015/01000/Issue_Editor_Foreword___The_Road_Less_Traveled___.3.pdf?token=method|ExpireAbsolute;source|Journals;ttl|1503504263140;payload|mY8D3u1TCCsNvP5E421JYK6N6XICDamxByyYpaNzk7FKjTaa1Yz22MivkHZqjGP4kdS2v0J76WGAnHACH69s21Csk0OpQi3YbjEMdSoz2UhVybFqQxA7lKwSUlA502zQZr96TQRwhVlocEp/sJ586aVbcBFlltKNKo+tbuMfL73hiPqJliudqs17cHeLcLbV/CqjlP3IO0jGHlHQtJWcICDdAyGJMnpi6RlbEJaRheGeh5z5uvqz3FLHgPKVXJzd3MwzatPPmExRIoN1wxhkgI6B/aGOV1UHrF6gSWIPu9mGfAKXqMAPjWnVRKdAaOsJ;hash|MLGOsVYh+Xcxckd+HWuDGg== 10.1097/tld.0000000000000047 +http://pdfs.journals.lww.com/transplantjournal/2008/05150/Induction_Immunosuppression_With_Thymoglobulin_and.13.pdf?token=method|ExpireAbsolute;source|Journals;ttl|1508434680116;payload|mY8D3u1TCCsNvP5E421JYK6N6XICDamxByyYpaNzk7FKjTaa1Yz22MivkHZqjGP4kdS2v0J76WGAnHACH69s21Csk0OpQi3YbjEMdSoz2UhVybFqQxA7lKwSUlA502zQZr96TQRwhVlocEp/sJ586aVbcBFlltKNKo+tbuMfL73hiPqJliudqs17cHeLcLbV/CqjlP3IO0jGHlHQtJWcICDdAyGJMnpi6RlbEJaRheGeh5z5uvqz3FLHgPKVXJzdGlb2qsojlvlytk14LkMXSBH8DKEbJi+muPNgdTx0qByN5IA4bWb87rrtIZYjtb8n;hash|o4zg1v/t3q5DlDkcm3axoA== 10.1097/tp.0b013e31816dd450 +http://pdl.cmu.edu/PDL-FTP/CloudComputing/a16-xu.pdf 10.1145/2668129 +http://personnel.mcgill.ca/files/markus.poschke/poschke_entrycost_productivity.pdf 10.1111/j.1468-0297.2010.02367.x +http://physiolgenomics.physiology.org/content/physiolgenomics/26/1/91.full.pdf 10.1152/physiolgenomics.00296.2005 +http://physiolgenomics.physiology.org/content/physiolgenomics/30/2/123.full.pdf 10.1152/physiolgenomics.00190.2006 +http://physiolgenomics.physiology.org/content/physiolgenomics/32/1/142.full.pdf 10.1152/physiolgenomics.00258.2006 +http://physiolgenomics.physiology.org/content/physiolgenomics/40/3/216.full.pdf 10.1152/physiolgenomics.zh7-3421-corr.2010 +http://physiolgenomics.physiology.org/content/physiolgenomics/42/1/67.full.pdf 10.1152/physiolgenomics.00174.2009 +http://physiolgenomics.physiology.org/content/physiolgenomics/43/21/1241.full.pdf 10.1152/physiolgenomics.00086.2011 +http://positron.physik.uni-halle.de/pbl/PA6-2.pdf 10.1002/app.10319 +http://publications.drdo.gov.in/ojs/index.php/dsj/article/download/1184/492 10.14429/dsj.61.1184 +http://publications.drdo.gov.in/ojs/index.php/dsj/article/download/6437/3495 10.14429/dsj.30.6437 +http://pubs.rsna.org/doi/pdf/10.1148/108.3.563 10.1148/108.3.563 +http://pubs.rsna.org/doi/pdf/10.1148/radiographics.11.1.1996393 10.1148/radiographics.11.1.1996393 +http://pubs.rsna.org/doi/pdf/10.1148/radiographics.11.3.1852942 10.1148/radiographics.11.3.1852942 +http://pubs.rsna.org/doi/pdf/10.1148/radiographics.12.3.1609142 10.1148/radiographics.12.3.1609142 +http://pubs.rsna.org/doi/pdf/10.1148/radiographics.15.6.8577962 10.1148/radiographics.15.6.8577962 +http://pubs.rsna.org/doi/pdf/10.1148/radiographics.16.4.8835987 10.1148/radiographics.16.4.8835987 +http://pubs.rsna.org/doi/pdf/10.1148/radiographics.17.5.9308113 10.1148/radiographics.17.5.9308113 +http://research.edm.uhasselt.be/~kris/research/publications/cams2006/cams06.pdf 10.1007/11915072_105 +http://research.nokia.com/files/NavigationWAPvsWeb.pdf 10.1145/642611.642669 +http://research.nokia.com/files/NavigationWAPvsWeb.pdf 10.1145/642667.642669 +http://research.nokia.com/files/RollingHistory.pdf 10.1109/iciw.2008.13 +http://revistas.usantotomas.edu.co/index.php/hallazgos/article/download/1591/1751 10.15332/s1794-3841.2004.0002.11 +https://academic.oup.com/bioinformatics/article-pdf/14/9/821/9731962/140821.pdf 10.1093/bioinformatics/14.9.821 +https://academic.oup.com/biolreprod/article-pdf/60/6/1285/10591108/biolreprod1285.pdf 10.1095/biolreprod60.6.1285 +https://academic.oup.com/cardiovascres/article-pdf/79/1/97/17391533/cvn073.pdf 10.1093/cvr/cvn073 +https://academic.oup.com/cid/advance-article-pdf/doi/10.1093/cid/cix1151/23553522/cix1151.pdf 10.1093/cid/cix1151 +https://academic.oup.com/europace/article-pdf/7/s1/149/8868969/149a.pdf 10.1093/europace/7.s1.149 +https://academic.oup.com/eurpub/article-pdf/14/suppl_1/9/1407774/14s10009b.pdf 10.1093/eurpub/14.suppl_1.9-b +https://academic.oup.com/gerontologist/article-pdf/56/Suppl_3/570/7933522/gnw162.2288.pdf 10.1093/geront/gnw162.2288 +https://academic.oup.com/mbe/article-pdf/25/4/617/4094308/msn020.pdf 10.1093/molbev/msn020 +https://academic.oup.com/schizophreniabulletin/article-pdf/39/6/NP/13567626/sbs184.pdf 10.1093/schbul/sbs184 +http://sammelpunkt.philo.at:8080/679/1/9102.0.antisemi.pdf 10.1017/s0003975600006536 +http://sci2s.ugr.es/publications/ficheros/chica-etal-IEA-AIE-LNAI-6098-656-665.pdf 10.1007/978-3-642-13033-5_67 +https://digital.library.unt.edu/ark:/67531/metadc704352/m2/1/high_res_d/73948.pdf 10.2172/73948 +https://digital.library.unt.edu/ark:/67531/metadc723847/m2/1/high_res_d/777715.pdf 10.1063/1.1384386 +https://files.eccomasproceedia.org/papers/compdyn-2013/C1181.pdf?mtime=20170330153901 10.7712/120113.4766.c1181 +https://files.eccomasproceedia.org/papers/compdyn-2015/1114.pdf?mtime=20170329172028 10.7712/120115.3482.1114 +https://files.eccomasproceedia.org/papers/compdyn-2017/18010.pdf?mtime=20171002111627 10.7712/120117.5608.18010 +https://files.eccomasproceedia.org/papers/eccomas-congress-2016/4678.pdf?mtime=20170308164728 10.7712/100016.2360.4678 +https://files.eccomasproceedia.org/papers/eccomas-congress-2016/6566.pdf?mtime=20170308164900 10.7712/100016.2436.6566 +https://files.eccomasproceedia.org/papers/eccomas-congress-2016/7593.pdf?mtime=20170308164959 10.7712/100016.2339.7593 +https://files.eccomasproceedia.org/papers/eccomas-congress-2016/8613.pdf?mtime=20170308165111 10.7712/100016.2380.8613 +https://hal.archives-ouvertes.fr/hal-01480381/file/rebouah2014.pdf 10.1007/s11012-014-0065-0 +http://sim.sagepub.com/cgi/reprint/78/9/552.pdf 10.1177/0037549702078009003 +http://sim.sagepub.com/cgi/reprint/79/1/43.pdf 10.1177/0037549703079001004 +http://sim.sagepub.com/cgi/reprint/83/4/347.pdf 10.1177/0037549707083114 +https://link.springer.com/content/pdf/10.1007%2FBF02907787.pdf 10.1007/bf02907787 +https://link.springer.com/content/pdf/10.1007%2FBF03016672.pdf 10.1007/bf03016672 +https://link.springer.com/content/pdf/10.1007%2Fs00520-016-3183-5.pdf 10.1007/s00520-016-3183-5 +https://link.springer.com/content/pdf/10.1007%2Fs11298-013-0228-7.pdf 10.1007/s11298-013-0228-7 +https://link.springer.com/content/pdf/10.1007%2Fs12498-017-0083-7.pdf 10.1007/s12498-017-0083-7 +https://link.springer.com/content/pdf/10.1007%2Fs15006-016-8681-3.pdf 10.1007/s15006-016-8681-3 +https://link.springer.com/content/pdf/10.3758%2FBF03332477.pdf 10.3758/bf03332477 +https://link.springer.com/content/pdf/10.5047%2Feps.2013.07.002.pdf 10.5047/eps.2013.07.002 +https://medicaljournals.se/acta/content_files/download.php?doi=10.1080/000155500750012298 10.1080/000155500750012298 +https://medicaljournals.se/acta/content_files/download.php?doi=10.1080/00015550252948293 10.1080/00015550252948293 +https://medicaljournals.se/acta/content_files/download.php?doi=10.1080/00015550260132460 10.1080/00015550260132460 +https://medicaljournals.se/acta/content_files/download.php?doi=10.1080/00015550410015021 10.1080/00015550410015021 +https://medicaljournals.se/acta/content_files/download.php?doi=10.1080/00015550410035506 10.1080/00015550410035506 +https://medicaljournals.se/acta/content_files/download.php?doi=10.1080/000155598443150 10.1080/000155598443150 +https://medicaljournals.se/acta/content_files/download.php?doi=10.1080/000155599750011390 10.1080/000155599750011390 +https://medicaljournals.se/acta/content_files/download.php?doi=10.1080/000155599750011543 10.1080/000155599750011543 +https://medicaljournals.se/acta/content_files/download.php?doi=10.2340/00015555-0682 10.2340/00015555-0682 +https://medicaljournals.se/jrm/content_files/download.php?doi=10.2340/16501977-0346 10.2340/16501977-0346 +https://naukaru.ru/upload/3b1930e4b44266e6c865f7ace0a388cb/files/c79fcee6e23e75871cbcdae3c16c2349.pdf 10.12737/article_59c2218b364919.10451231 +https://naukaru.ru/upload//4a704431a460bf5e4e83a493453bf007/files/9242f702ea1810742469d6676b8b3aa8.pdf 10.12737/textbook_591c08bd062351.82204678 +https://naukaru.ru/upload/54847727897c76ae86f389a846277ae4/files/33ae183a2b6ae714329b682a807e6996.pdf 10.12737/article_593610d7e01601.48990640 +https://naukaru.ru/upload/54847727897c76ae86f389a846277ae4/files/8101f802d2511ae026c50d920e0b3e88.pdf 10.12737/article_59acae29427229.09098671 +https://naukaru.ru/upload/54847727897c76ae86f389a846277ae4/files/9f618e8dbb5eba0ab5e6e4f134306386.pdf 10.12737/article_5a24a1718d3b44.24529868 +https://naukaru.ru/upload/6e66237b1c7e3dbdace2ba2b2c66689c/files/71f7519e8e206a340786f04b448825fa.pdf 10.22412/1999-5644-11-4-9 +https://naukaru.ru/upload//7fd3f86c299d8e1ce467f949bdfec858/files/a4dbcfb7161840b49f4e1f79f66added.pdf 10.12737/10403 +https://naukaru.ru/upload//7fd3f86c299d8e1ce467f949bdfec858/files/bd0da773d3b6276e191d3b81858d244a.pdf 10.12737/19856 +https://naukaru.ru/upload//bab5cbd20a12c9cf6b31a482e8c81cfc/files/08a36cba97e15706b772305d35866f4b.pdf 10.12737/article_590823a5489433.14864804 +https://naukaru.ru/upload/f9c79fc3294eae5f36c7499ef8f3a242/files/27935d3b99172e7ae7cf70c32ef1a5c2.pdf 10.12737/article_58fdaaf1a0a9a1.62886932 +https://naukaru.ru/upload/f9c79fc3294eae5f36c7499ef8f3a242/files/bd6a6aa2ddc3d3b02ba8955d9fb36e2a.pdf 10.12737/21719 +https://naukaru.ru/upload/f9c79fc3294eae5f36c7499ef8f3a242/files/fc31d490887a6e6607b957853da1a47a.pdf 10.12737/24988 +https://openknowledge.worldbank.org/bitstream/handle/10986/4432/wbro_25_1_1.pdf?sequence=1 10.1093/wbro/lkp015 +https://pdfs.journals.lww.com/ejanaesthesiology/2005/05001/The_effect_of_psoas_compartment_block_on.418.pdf?token=method|ExpireAbsolute;source|Journals;ttl|1516613884108;payload|mY8D3u1TCCsNvP5E421JYK6N6XICDamxByyYpaNzk7FKjTaa1Yz22MivkHZqjGP4kdS2v0J76WGAnHACH69s21Csk0OpQi3YbjEMdSoz2UhVybFqQxA7lKwSUlA502zQZr96TQRwhVlocEp/sJ586aVbcBFlltKNKo+tbuMfL73hiPqJliudqs17cHeLcLbV/CqjlP3IO0jGHlHQtJWcICDdAyGJMnpi6RlbEJaRheGeh5z5uvqz3FLHgPKVXJzdGZnEagBFgfcfP0kYnmKqykYvKo7hq8lXeandQgqLPGGKqWzjtuI3U8a6r7La83oj;hash|XnPXiUi1ApXtswkmtWDg9A== 10.1097/00003643-200505001-00418 +http://suma.ldc.usb.ve/docs/corbasuma1.ps.gz 10.1007/3-540-48228-8_72 +https://www.avma.org/kb/policies/documents/euthanasia.pdf 10.1071/wr14094 +https://www.bancaditalia.it/studiricerche/convegni/atti/fiscal_ind/role/6.pdf 10.2139/ssrn.2005255 +https://www.bancaditalia.it/studiricerche/convegni/atti/fiscal_sustainability/session_2/carone_costello_diezguardia_eckefeldt_mourre.pdf 10.2139/ssrn.1997174 +https://www.bancaditalia.it/studiricerche/convegni/atti/luxembourg/s6/bover.pdf 10.2139/ssrn.1093617 +https://www.cambridge.org/core/services/aop-cambridge-core/content/view/154369C726265BE6BB15DA38B79F06B1/S0007125000123669a.pdf/div-class-title-product-research-in-community-and-mental-health-vol-5-edited-by-name-givennames-james-r-givennames-surname-grantly-surname-name-london-jai-press-1985-pp-319-45-70-product-div.pdf 10.1192/s0007125000123669 +https://www.cambridge.org/core/services/aop-cambridge-core/content/view/3D0E42B17917FBE3DB9B5D1AEEB62405/S0007114506002066a.pdf/div-class-title-wine-constituents-inhibit-thrombosis-but-not-atherogenesis-in-c57bl-6-apolipoprotein-e-deficient-mice-div.pdf 10.1079/bjn20061818 +https://www.cambridge.org/core/services/aop-cambridge-core/content/view/4274D6E3BEF2797E9843EF301318B046/S0007125000034863a.pdf/div-class-title-bjp-volume-164-issue-5-cover-and-front-matter-div.pdf 10.1192/s0007125000034863 +https://www.cambridge.org/core/services/aop-cambridge-core/content/view/56427824675F25E2AACAE79E9602F91E/S1446788700031815a.pdf/div-class-title-bmo-and-singular-integrals-over-local-fields-div.pdf 10.1017/s1446788700031815 +https://www.cambridge.org/core/services/aop-cambridge-core/content/view/6881D0F4189B32210C60ED9DD336C012/S0883769400054646a.pdf/div-class-title-mrs-volume-11-issue-3-back-cover-ibc-obc-and-matter-div.pdf 10.1557/s0883769400054646 +https://www.cambridge.org/core/services/aop-cambridge-core/content/view/A291CBD43AD6F7FA0F44E6592E214060/S0022149X00006660a.pdf/div-class-title-jhl-volume-54-issue-4-cover-and-back-matter-div.pdf 10.1017/s0022149x00006660 +https://www.cambridge.org/core/services/aop-cambridge-core/content/view/D54FA70B9F33DA6C34CD2C67A1447FF2/S2398568200000133a.pdf/div-class-title-the-work-statuses-of-slaves-and-freedmen-in-the-great-ports-of-the-roman-world-first-century-bce-second-century-ce-a-href-fn1s-ref-type-fn-a-div.pdf 10.1017/s2398568200000133 +https://www.degruyter.com/printpdf/view/IUPAC/iupac.28.0009 10.1515/iupac.28.0009 +https://www.jstage.jst.go.jp/article/cpb1958/33/4/33_4_1620/_pdf 10.1248/cpb.33.1620 +https://www.jstage.jst.go.jp/article/geosoc1893/83/1/83_1_19/_pdf 10.5575/geosoc.83.19 +https://www.jstage.jst.go.jp/article/gomu1944/36/6/36_6_543/_pdf 10.2324/gomu.36.6_543 +https://www.jstage.jst.go.jp/article/jhs1956/42/1/42_1_P30/_pdf 10.1248/jhs1956.42.p30 +https://www.jstage.jst.go.jp/article/jjrt/43/3/43_KJ00001362303/_pdf 10.6009/jjrt.kj00001362303 +https://www.jstage.jst.go.jp/article/jpestics1975/19/3/19_3_197/_pdf 10.1584/jpestics.19.3_197 +https://www.jstage.jst.go.jp/article/kaigan/67/2/67_2_I_376/_pdf 10.2208/kaigan.67.i_376 +https://www.jstage.jst.go.jp/article/kantoh1988/2002/15/2002_15_153/_pdf 10.5690/kantoh.2002.153 +https://www.jstage.jst.go.jp/article/nikkashi1898/68/11/68_11_2145/_pdf 10.1246/nikkashi1898.68.11_2145 +https://www.jstage.jst.go.jp/article/rpsj1954/28/1/28_1_23/_pdf 10.4144/rpsj1954.28.23 +https://www.mededpublish.org/MedEdPublish/PDF/1544-8764.pdf 10.15694/mep.2018.0000087.1 +https://www.mededpublish.org/MedEdPublish/PDF/657-2457.pdf 10.15694/mep.2016.000135 +https://www.medicaljournals.se/acta/content_files/download.php?doi=10.1080/000155500750043096 10.1080/000155500750043096 +https://www.medicaljournals.se/acta/content_files/download.php?doi=10.2340/00015555-1046 10.2340/00015555-1046 +https://www.medicaljournals.se/acta/content_files/download.php?doi=10.2340/00015555-1107 10.2340/00015555-1107 +https://www.medicaljournals.se/acta/content_files/download.php?doi=10.2340/00015555-1328 10.2340/00015555-1328 +https://www.medicaljournals.se/acta/content_files/download.php?doi=10.2340/00015555-1371 10.2340/00015555-1371 +https://www.medicaljournals.se/acta/content_files/download.php?doi=10.2340/00015555-1505 10.2340/00015555-1505 +https://www.medicaljournals.se/acta/content_files/download.php?doi=10.2340/00015555-1613 10.2340/00015555-1613 +https://www.medicaljournals.se/acta/content_files/download.php?doi=10.2340/00015555-1702 10.2340/00015555-1702 +https://www.medicaljournals.se/acta/content_files/download.php?doi=10.2340/00015555-1890 10.2340/00015555-1890 +https://www.medicaljournals.se/acta/content_files/download.php?doi=10.2340/00015555-1920 10.2340/00015555-1920 +https://www.medicaljournals.se/acta/content_files/download.php?doi=10.2340/00015555-2251 10.2340/00015555-2251 +https://www.medicaljournals.se/acta/content_files/download.php?doi=10.2340/00015555-2456 10.2340/00015555-2456 +https://www.medicaljournals.se/jrm/content_files/download.php?doi=10.2340/16501977-0965 10.2340/16501977-0965 +https://www.medicaljournals.se/jrm/content_files/download.php?doi=10.2340/16501977-1239 10.2340/16501977-1239 +https://www.medicaljournals.se/jrm/content_files/download.php?doi=10.2340/16501977-1813 10.2340/16501977-1813 +https://www.medicaljournals.se/jrm/content_files/download.php?doi=10.2340/16501977-1817 10.2340/16501977-1817 +https://www.medicaljournals.se/jrm/content_files/download.php?doi=10.2340/16501977-1853 10.2340/16501977-1853 +https://www.medicaljournals.se/jrm/content_files/download.php?doi=10.2340/16501977-2077 10.2340/16501977-2077 +https://www.medicaljournals.se/jrm/content_files/download.php?doi=10.2340/16501977-2172 10.2340/16501977-2172 +https://www.nature.com/articles/4500257.pdf 10.1038/sj.pcan.4500257 +https://www.nature.com/articles/nm0911-1153c.pdf 10.1038/nm0911-1153c +https://www.orgchem.science.ru.nl/pubs/10.1039_b417766e.pdf 10.1039/b417766e +https://www.orgchem.science.ru.nl/pubs/10.1126_1668.pdf 10.1126/science.1164647 +https://www.osti.gov/servlets/purl/10190697 10.2172/10190697 +https://www.osti.gov/servlets/purl/1150486 10.2172/1150486 +https://www.osti.gov/servlets/purl/4121869 10.2172/4121869 +https://www.osti.gov/servlets/purl/4475066 10.2172/4475066 +https://www.osti.gov/servlets/purl/5348256 10.2172/5348256 +https://www.osti.gov/servlets/purl/6171208 10.2172/6171208 +https://www.osti.gov/servlets/purl/6181299 10.2172/6181299 +https://www.osti.gov/servlets/purl/6316100 10.2172/6316100 +https://www.researchgate.net/profile/Antoine_Ferreira/publication/221069395_Characterization_of_Protein_based_Spring-like_Elastic_Joints_for_Biorobotic_Applications/links/02bfe5137aa53c9adb000000.pdf 10.1109/robot.2006.1641966 +https://www.researchgate.net/profile/Bjorn_Winther-Jensen/publication/245080928_Plasma-polymerized_coatings_for_bio-MEMS_applications/links/00b4953511521dd3a6000000.pdf 10.1016/j.sna.2003.10.032 +https://www.researchgate.net/profile/Keqi_Zhang/publication/224170284_A_three-dimensional_geographic_and_storm_surge_data_integration_system_for_evacuation_planning/links/09e4150cf37102c074000000.pdf 10.1109/iri.2010.5558942 +https://www.researchgate.net/profile/Peter_Baumung/publication/4241663_P2P-Based_Semantic_Service_Management_in_Mobile_Ad-hoc_Networks/links/00b4952682877efcfe000000.pdf 10.1109/mdm.2006.121 +https://www.researchgate.net/profile/Phillip_Wild/publication/222564117_Detecting_finite_bandwidth_periodic_signals_in_stationary_noise_using_the_signal_coherence_spectrum/links/0c9605213418842ec6000000.pdf 10.1016/j.sigpro.2005.02.008 +https://www.researchgate.net/profile/Sorin_Ciolofan2/publication/224089779_Recordingarchiving_in_IBM_lotus_sametime_based_collaborative_environment/links/54255ade0cf26120b7ac915e.pdf 10.1109/imcsit.2009.5352798 +https://www.tandfonline.com/doi/pdf/10.1080/0962021980020028?needAccess=true 10.1080/0962021980020028 +https://www.tandfonline.com/doi/pdf/10.1080/09853111.2007.9736326?needAccess=true 10.1080/09853111.2007.9736326 +https://www.tandfonline.com/doi/pdf/10.3109/02713688409000802?needAccess=true 10.3109/02713688409000802 +http://synapse.koreamed.org/Synapse/Data/PDFData/0068KJR/kjr-14-139.pdf 10.3348/kjr.2013.14.2.139 +http://users.wpi.edu/%7Eopavlov/index_files/WP+2014-006_Zaini+et+al.pdf 10.2139/ssrn.2465025 +http://virtualmentor.ama-assn.org/2007/03/pdf/ccas3-0703.pdf 10.1001/virtualmentor.2007.9.3.ccas3-0703 +http://web.inc.bme.hu/~csonka/csg/reprints/ijqc97v65p817.pdf 10.1002/(sici)1097-461x(1997)65:5<817::aid-qua46>3.3.co;2-r +http://web.inc.bme.hu/~csonka/csg/reprints/jcc01p0241.pdf 10.1002/1096-987x(20010130)22:2<241::aid-jcc11>3.0.co;2-c +http://web.mit.edu/bchen/www/pubs/it01-chen.pdf 10.1109/18.923725 +http://web.mit.edu/bchen/www/pubs/it01-chen.pdf 10.1109/isit.2000.866336 +http://web.mit.edu/bcs/graybiel-lab/publications/JNeuroPhy_Matsumoto.pdf 10.1152/jn.1999.82.2.978 +http://web.mit.edu/dolecek/www/ISIT06.pdf 10.1109/isit.2006.261911 +http://web.mit.edu/dolecek/www/ITW07.pdf 10.1109/itw.2007.4313074 +http://web.mit.edu/hanlimc/www/hl.docs/ChoiHow_ACC09.pdf 10.1109/acc.2009.5160721 +http://web.mit.edu/larsb/www/BlackmoreOnoWilliamsIEEE-TRO11.pdf 10.1109/tro.2011.2161160 +http://web.mit.edu/medard/www/pubs/papers2012/delay.pdf 10.1109/wcnc.2012.6214165 +http://web.mit.edu/schulz/www/sw.ps 10.1287/moor.27.4.681.305 +http://web.mit.edu/sjgershm/www/dopamine.pdf 10.1007/978-1-4614-7320-6_631-3 +http://web.mit.edu/sloan-msa/Papers/1.12.pdf 10.1137/s089548019936223x +http://web.mit.edu/yuhong/www/Downing.pdf 10.1126/science.1063414 +http://www2.asanet.org/journals/asr/2006/052sp7.pdf 10.1177/000312240607100407 +http://www2.dac.com/41st/41acceptedpapers.nsf/0c4c09c6ffa905c487256b7b007afb72/b23ec16f6e1fc42c87256e54007a1f0a/$file/13_3.pdf 10.1145/996566.996624 +http://www2.informatik.uni-freiburg.de/%7Eali/papers/P2P-loadbalancing.pdf 10.1109/pdp.2014.79 +http://www.amjbot.org/content/100/10/2016.full.pdf 10.3732/ajb.1300036 +http://www.amjbot.org/content/90/5/749.full.pdf 10.3732/ajb.90.5.749 +http://www.amjbot.org/content/91/5/664.full.pdf 10.3732/ajb.91.5.664 +http://www.amjbot.org/content/92/7/1085.full.pdf 10.3732/ajb.92.7.1085 +http://www.amjbot.org/content/93/2/271.full.pdf 10.3732/ajb.93.2.271 +http://www.amjbot.org/content/94/4/568.full.pdf 10.3732/ajb.94.4.568 +http://www.amjbot.org/content/96/1/207.full.pdf 10.3732/ajb.0800348 +http://www.amjbot.org/content/99/10/1638.full.pdf 10.3732/ajb.1200279 +http://www.amjbot.org/content/99/5/e213.full.pdf 10.3732/ajb.1100519 +http://www.ams.org/qam/1960-18-03/S0033-569X-1960-0115596-7/S0033-569X-1960-0115596-7.pdf 10.1090/qam/115596 +http://www.bancaditalia.it/pubblicazioni/econo/temidi/td10/td759_10/en_td_759_10/en_tema_759.pdf 10.5089/9781455210732.001 +http://www.bath.ac.uk/elec-eng/research/sipg/papers/icip97ztdct.pdf 10.1109/icip.1997.638849 +http://www.bath.ac.uk/math-sci/bics/preprints/BICS08_12.pdf 10.1002/nla.641 +http://www-brazos.rice.edu/brazos/papers/willowSC92.ps 10.1109/superc.1992.236669 +http://www.cais.ntu.edu.sg/~axsun/paper/sun_icadl07s.pdf 10.1007/978-3-540-77094-7_44 +http://www.cis.umassd.edu/%7Exbai/pubs/J-DirectionalCoverage.pdf 10.1109/mobhoc.2009.5336965 +http://www.cis.umassd.edu/~eeberbach/papers/cec2000.ps 10.1109/cec.2000.870810 +http://www.cis.umassd.edu/~vvokkarane/publications/DHP_Broadnets05.pdf 10.1109/icbn.2005.1589620 +http://www.cs.cas.cz/vera/publications/journals/I3Etools.pdf 10.1109/18.971754 +http://www.cs.cf.ac.uk/meshfiltering/index_files/Doc/p11-sun.pdf 10.1145/1236246.1236252 +http://www.cs.cornell.edu/~koch/www.infosys.uni-sb.de/publications/0606075.pdf 10.1109/icde.2007.367906 +http://www.cse.ucla.edu/products/Reports/TECH379.pdf 10.1201/9781439833704.ch13 +http://www.cs.ubc.ca/%7Ehassanm/ICIP10_HDR_MultiExposed_Stereo.pdf 10.1109/icip.2010.5653371 +http://www.cs.ubc.ca/~hllam/doc/LamMunzner_IncUtilStuMeta_2008.pdf 10.1145/1377966.1377969 +http://www.cs.ubc.ca/nest/lci/papers/1996/lloyd-icra96DAO.pdf 10.1109/robot.1996.506577 +http://www.cs.ubc.ca/~vogel/publications/icra08_VogelDeFreitas.pdf 10.1109/robot.2008.4543568 +http://www.cs.uwm.edu/%7Eccheng/papers/WG-long.pdf 10.1007/978-3-642-25870-1_11 +http://www.cs.uwm.edu/~suzuki/papers/algorithmicapaper.ps 10.1007/s00453-001-0045-3 +http://www.cs.uwm.edu/~wang/papers/virus.ps.gz 10.1007/978-0-387-35515-3_17 +http://www.dbs.informatik.uni-muenchen.de/~seidl/papers/SSTD01-Sequences.pdf 10.1007/3-540-47724-1_25 +http://www.degruyter.com/downloadpdf/j/biolog.2009.64.issue-6/s11756-009-0203-7/s11756-009-0203-7.xml 10.2478/s11756-009-0203-7 +http://www.degruyter.com/downloadpdf/j/dema.1975.8.issue-2/dema-1975-0207/dema-1975-0207.xml 10.1515/dema-1975-0207 +http://www.degruyter.com/downloadpdf/j/hzhz.1998.266.issue-1/hzhz.1998.266.jg.ii/hzhz.1998.266.jg.ii.xml 10.1524/hzhz.1998.266.jg.ii +http://www.ece.iisc.ernet.in/%7Enextgenwrl/papers/ananya_2014_twc.pdf 10.1109/twc.2014.2378279 +http://www.ece.stevens-tech.edu/~mouli/spiesteg02.pdf 10.1117/12.465273 +http://www.ee.ed.ac.uk/~dil/publications/june92.ps 10.1109/icwc.1992.200714 +http://www.ee.ed.ac.uk/~sasg/Papers/96_papers/ICPR96_whn.ps 10.1109/icpr.1996.546998 +http://www.ee.ed.ac.uk/~sasg/Papers/97_papers/ICASSP97_jst.ps 10.1109/icassp.1997.604827 +http://www.ee.ed.ac.uk/~sasg/Papers/98_papers/ISSSTA98a_ga.ps 10.1109/isssta.1998.723851 +http://www.efmaefm.org/0EFMAMEETINGS/EFMA%20ANNUAL%20MEETINGS/2010-Aarhus/EFMA2010_0067_fullpaper.pdf 10.1111/j.1467-629x.2012.00474.x +http://www.efmaefm.org/0EFMAMEETINGS/EFMA%20ANNUAL%20MEETINGS/2010-Aarhus/EFMA2010_0072_fullpaper.pdf 10.2139/ssrn.1568908 +http://www.efmaefm.org/0EFMAMEETINGS/EFMA%20ANNUAL%20MEETINGS/2010-Aarhus/EFMA2010_0074_fullpaper.pdf 10.2139/ssrn.1458963 +http://www.efmaefm.org/0EFMAMEETINGS/EFMA%20ANNUAL%20MEETINGS/2010-Aarhus/EFMA2010_0548_fullpaper.pdf 10.2139/ssrn.2519425 +http://www.efmaefm.org/0EFMAMEETINGS/EFMA%20ANNUAL%20MEETINGS/2010-Aarhus/Where%20Are%20the%20Smart%20Investors%2022.pdf 10.2139/ssrn.1541790 +http://www.efmaefm.org/efma2005/papers/30-bley_paper.pdf 10.1016/j.gfj.2006.06.009 +http://www.efmaefm.org/efma2006/papers/140989_full.pdf 10.1007/s10657-007-9001-2 +http://www.efmaefm.org/efma2006/papers/243629_full.pdf 10.1080/13504860600858030 +http://www.efmaefm.org/efmsympo2005/accepted_papers/06-Neil_Brisley_paper.pdf 10.1111/j.1540-6261.2006.01064.x +http://www.erudit.org/fr/revues/cgq/2010-v54-n152-cgq4005/045655ar.pdf 10.7202/045655ar +http://www.erudit.org/fr/revues/cqd/2016-v45-n2-cqd03114/1040395ar.pdf 10.7202/1040395ar +http://www.erudit.org/fr/revues/fa/2005-n19-fa1813267/1005327ar.pdf 10.7202/1005327ar +http://www.erudit.org/fr/revues/philoso/1996-v23-n1-philoso1802/027374ar.pdf 10.7202/027374ar +http://www.erudit.org/fr/revues/tce/1999-n61-tce600/008164ar.pdf 10.7202/008164ar +http://www.jspes.org/pdfs/fall2013/fall_2013_bookreviews.pdf 10.5860/choice.50-0271 +http://www.kantakji.com/media/5184/172-al-masri_07.pdf 10.4197/islec.17-2.7 +http://www.lance.colostate.edu/depts/ee/Research/vlsi/Pubs/elec_mfg_simp.ps 10.1109/iemt.1994.404688 +http://www.lcc.uma.es/~afdez/Papers/Papers/cp03.pdf 10.1007/978-3-540-45193-8_61 +http://www-m4.mathematik.tu-muenchen.de/m4/Papers/Klueppelberg/ex.ps.gz 10.2307/3215187 +http://www-m4.mathematik.tu-muenchen.de/m4/pers/tasche/Papers/least.ps 10.1007/978-3-642-57338-5_33 +http://www.maths.lse.ac.uk/Personal/amol/CDAM-LSE-2007-07.pdf 10.1007/s11785-009-0009-1 +http://www.math.tu-bs.de/~bussieck/MSNLP.pdf 10.1080/10556780902912389 +http://www.math.uga.edu/~pete/Stohr-Voloch.pdf 10.1112/plms/s3-52.1.1 +http://www.math.unipd.it/~claudia/download/classic_csl.ps 10.1007/bfb0028016 +http://www.mdpi.com/1420-3049/17/8/9573/pdf 10.3390/molecules17089573 +http://www.mdpi.com/1420-3049/2/10/M34/pdf 10.3390/m34 +http://www.mdpi.com/1420-3049/4/4/M92/pdf 10.3390/m92 +http://www.mdpi.com/2073-4344/6/12/199/pdf 10.3390/catal6120199 +http://www.medicaljournals.se/acta/content_files/download.php?doi=10.2340/00015555-1857 10.2340/00015555-1857 +http://www.medicaljournals.se/jrm/content/download.php?doi=10.1080%2F16501960410023877 10.1080/16501960410023877 +http://www.molevol.de/molevol2/publications/24.pdf 10.1007/bf00178247 +http://www.nada.kth.se/~anfa/smalllargeforce.pdf 10.1152/jn.2001.85.6.2613 +http://www.nada.kth.se/~christe/pdc99.ps.gz 10.1007/978-3-642-57313-2_20 +http://www.nada.kth.se/~danik/Papers/kragic_iros03_1.pdf 10.1109/iros.2003.1249684 +http://www.nada.kth.se/~ekvall/ekvall_icra2005_2.pdf 10.1109/robot.2005.1570207 +http://www.nada.kth.se/~enge/papers/k-CSP.pdf 10.1007/3-540-44666-4_27 +http://www.nada.kth.se/~enge/papers/k-CSP.pdf 10.1137/s0895480100380458 +http://www.nada.kth.se/~eral02/3107/Hood2000.pdf 10.1016/s1096-7494(00)00032-5 +http://www.nada.kth.se/~green/publications/files/greenmiscom.pdf 10.1109/iros.2006.282256 +http://www.nada.kth.se/~helsing/97JMPS.ps 10.1016/s0022-5096(97)00041-0 +http://www.nada.kth.se/~hoffmann/cdc98c.pdf 10.1109/cdc.1998.761749 +http://www.nada.kth.se/~hoffmann/ifsanafips.pdf 10.1109/nafips.2001.943782 +http://www.nada.kth.se/~ojvind/papers/col.pdf 10.1016/s0020-0190(99)00064-2 +http://www.nature.com/articles/001062b0.pdf 10.1038/001062b0 +http://www.nature.com/articles/176248b0.pdf 10.1038/176248b0 +http://www.nature.com/articles/ncomms14224.pdf 10.1038/ncomms14224 +http://www.nature.com/articles/ncomms15496.pdf 10.1038/ncomms15496 +http://www.nature.com/articles/nri2050.pdf 10.1038/nri2050 +http://www.nature.com/articles/sj.bdj.2017.1116.pdf 10.1038/sj.bdj.2017.1116 +http://www.nature.com/articles/sj.bdj.2017.30.pdf 10.1038/sj.bdj.2017.30 +http://www.nature.com/articles/srep01163.pdf 10.1038/srep01163 +http://www.nature.com/articles/srep25482.pdf 10.1038/srep25482 +http://www.nature.com/bdj/journal/v216/n8/pdf/sj.bdj.2014.350.pdf 10.1038/sj.bdj.2014.350 +http://www.nature.com/ejcn/journal/v51/n2/pdf/1600368a.pdf 10.1038/sj.ejcn.1600368 +http://www.nature.com/hdy/journal/v28/n1/pdf/hdy197213a.pdf 10.1038/hdy.1972.13 +http://www.nature.com/jes/journal/v18/n3/pdf/7500594a.pdf 10.1038/sj.jes.7500594 +http://www.nature.com/leu/journal/v11/n1/pdf/2400524a.pdf 10.1038/sj.leu.2400524 +http://www.nature.com/modpathol/journal/v27/n8/pdf/modpathol2013228a.pdf 10.1038/modpathol.2013.228 +http://www.nature.com/mp/journal/v14/n3/pdf/4002121a.pdf 10.1038/sj.mp.4002121 +http://www.nature.com/pj/journal/v4/n4/pdf/pj197355a.pdf 10.1295/polymj.4.426 +http://www.nature.com/pr/journal/v26/n4/pdf/pr1989341a.pdf 10.1203/00006450-198910000-00013 +http://www.nature.com/sc/journal/v34/n5/pdf/sc199652a.pdf 10.1038/sc.1996.52 +http://www.ndsl.kr/soc_img/society/kimics/HOJBC0/2015/v19n2/HOJBC0_2015_v19n2_317.pdf 10.6109/jkiice.2015.19.2.317 +http://www.ndsl.kr/soc_img/society/ksht/OCRHB6/2012/v25n6/OCRHB6_2012_v25n6_292.pdf 10.12656/jksht.2012.25.6.292 +http://www.pjbs.org/pjnonline/fin3340.pdf 10.3923/pjn.2015.721.726 +http://www.pnas.org/content/102/28/9913.full.pdf 10.1073/pnas.0504273102 +http://www.pnas.org/content/107/48/20738.full.pdf 10.1073/pnas.1009635107 +http://www.pnas.org/content/87/21/8360.full.pdf 10.1073/pnas.87.21.8360 +http://www.pnas.org/content/93/16/8379.full.pdf 10.1073/pnas.93.16.8379 +http://www.ripublication.com/ijcirv3/ijcirv3n1_18.pdf 10.5019/j.ijcir.2007.92 +http://www.sc2002.org/paperpdfs/pap.pap105.pdf 10.1109/sc.2002.10061 +http://www.sc2002.org/paperpdfs/pap.pap122.pdf 10.1109/sc.2002.10045 +http://www.sc2002.org/paperpdfs/pap.pap137.pdf 10.1109/sc.2002.10005 +http://www.sc2002.org/paperpdfs/pap.pap167.pdf 10.1109/sc.2002.10018 +http://www.sc2002.org/paperpdfs/pap.pap275.pdf 10.1109/sc.2002.10046 +http://www.sc2002.org/paperpdfs/pap.pap327.pdf 10.1109/sc.2002.10001 +http://www-sccm.stanford.edu/~elling/elling-schwan-99.ps.gz 10.1007/3-540-48311-x_25 +http://www-sccm.stanford.edu/pub/sccm/sccm94-04.ps.gz 10.1117/12.190861 +http://www-sccm.stanford.edu/Students/vanderveen/SPtrans98b.ps.gz 10.1109/78.661335 +http://www.sci.brooklyn.cuny.edu/~eskicioglu/papers/EI2006-6072-2.pdf 10.1117/12.651154 +http://www.sci.brooklyn.cuny.edu/~eskicioglu/papers/ICN2002.pdf 10.1142/9789812776730_0019 +http://www.sci.brooklyn.cuny.edu/homepages/whitlock/papsc01.pdf 10.1007/3-540-45346-6_14 +http://www.scielo.br/pdf/brag/v33nunico/06.pdf 10.1590/s0006-87051974000100006 +http://www.scielo.br/pdf/jaos/v21n4/1678-7757-jaos-21-4-0300.pdf 10.1590/1678-775720130066 +http://www.scielo.br/pdf/pci/v14n2/v14n2a12.pdf 10.1590/s1413-99362009000200012 +http://www.scielo.br/pdf/rbent/v47n1/16467.pdf 10.1590/s0085-56262003000100014 +http://www.scielo.br/pdf/rbmet/v31n4s1/0102-7786-rbmet-31-04-s1-0662.pdf 10.1590/0102-7786312314b20150157 +http://www.scielo.br/pdf/rbz/v39n4/v39n4a25.pdf 10.1590/s1516-35982010000400025 +http://www.scielo.br/pdf/rlae/v1n2/v1n2a03.pdf 10.1590/s0104-11691993000200003 +http://www.seas.upenn.edu/%7Etanmoy/papers/CK08.pdf 10.1007/978-3-540-92185-1_61 +http://www.seas.upenn.edu/~biros/papers/lnks/paper.pdf 10.1137/s106482750241565x +http://www.seas.upenn.edu/~katef/papers/ECCV2010_FGsemi_supervised.pdf 10.1007/978-3-642-15567-3_41 +http://www.sigda.org/Archives/ProceedingArchives/Dac/Dac2003/papers/2003/dac03/pdffiles/20_4.pdf 10.1145/775832.775918 +http://www.sigda.org/Archives/ProceedingArchives/Dac/Dac97/papers/1997/dac97/pdffiles/33_2.pdf 10.1109/dac.1997.597206 +http://www.sigda.org/Archives/ProceedingArchives/Dac/Dac99/papers/1999/dac99/pdffiles/06_4.pdf 10.1109/43.851998 +http://www.sigda.org/Archives/ProceedingArchives/Dac/Dac99/papers/1999/dac99/pdffiles/06_4.pdf 10.1145/309847.309885 +http://www.sigda.org/Archives/ProceedingArchives/Iccad/Iccad2001/papers/2001/iccad01/pdffiles/09a_2.pdf 10.1109/iccad.2001.968700 +http://www.sigda.org/Archives/ProceedingArchives/Iccad/Last20/Papers/1995/ICCAD95_0045.pdf 10.1109/iccad.1995.479989 +http://www.site.uottawa.ca/%7Ebochmann/dsrg/PublicDocuments/Publications/Zhan09b.pdf 10.1108/03321640910999914 +http://www.site.uottawa.ca/%7Eivan/beaconless+TC.pdf 10.1109/tc.2012.160 +http://www.site.uottawa.ca/~adler/publications/2004/qu-elsaddik-adler-2004-ccece-stroke-based-signature.pdf 10.1109/ccece.2004.1345055 +http://www.site.uottawa.ca/~bochmann/dsrg/PublicDocuments/Publications/Zhen06a.pdf 10.1109/sarnof.2006.4534712 +http://www.site.uottawa.ca/~bouchard/publis/icassp2004_pap.pdf 10.1109/icassp.2004.1326778 +http://www.site.uottawa.ca/~ivan/harmony-ICC.pdf 10.1109/icc.2012.6363859 +http://www.site.uottawa.ca/~klement/papers/cai2009.pdf 10.1007/978-3-642-01818-3_11 +http://www.site.uottawa.ca/research/viva/projects/ibr/publications/EDubois_SPL.pdf 10.1109/lsp.2005.859503 +http://www.site.uottawa.ca/~sylvia/techreports/techreportfacetgentech.pdf 10.1007/978-3-540-76796-1_2 +http://www.site.uottawa.ca/~wgong/pdfs/ACOSINFOCOM.pdf 10.1109/infocom.2014.6847972 +http://www.softlab.is.tsukuba.ac.jp/iplab/paper/international/miuramo-kes2005.pdf 10.1007/11552413_60 +http://www.tandfonline.com/doi/pdf/10.1016/S0968-8080%2804%2924137-4?needAccess=true 10.1016/s0968-8080(04)24137-4 +http://www.tandfonline.com/doi/pdf/10.1080/00021369.1985.10866942?needAccess=true 10.1080/00021369.1985.10866942 +http://www.tandfonline.com/doi/pdf/10.1080/00173139009429979?needAccess=true 10.1080/00173139009429979 +http://www.tandfonline.com/doi/pdf/10.1080/02786820902939708?needAccess=true 10.1080/02786820902939708 +http://www.tandfonline.com/doi/pdf/10.1080/07438141.2011.627625?needAccess=true 10.1080/07438141.2011.627625 +http://www.tandfonline.com/doi/pdf/10.1626/pps.15.253?needAccess=true 10.1626/pps.15.253 +http://www.tandfonline.com/doi/pdf/10.3109/15563657708992433?needAccess=true 10.3109/15563657708992433 +http://www.triumf.ca/pac97/papers/pdf/9P006.PDF 10.1109/pac.1997.749581 +http://www.triumf.ca/pac97/papers/pdf/9V005.PDF 10.1109/pac.1997.750741 +http://www.triumf.ca/pac97/papers/pdf/9V005.PDF 10.1109/pac.1997.750742 +http://www.weizmann.ac.il/immunology/iruncohen/reprints/2006/461.pdf 10.1191/0961203306lu2328oa diff --git a/report_template.md b/report_template.md deleted file mode 100644 index 139598b..0000000 --- a/report_template.md +++ /dev/null @@ -1,108 +0,0 @@ - -# Crawl QA Report - -This crawl report is auto-generated from a sqlite database file, which should be available/included. - -### Seedlist Stats - -```sql -SELECT COUNT(DISTINCT identifier) as identifiers, COUNT(DISTINCT initial_url) as uris, COUNT(DISTINCT initial_domain) AS domains FROM crawl_result; -``` - -FTP seed URLs - -```sql -SELECT COUNT(*) as ftp_urls FROM crawl_result WHERE initial_url LIKE 'ftp://%'; -``` - -### Successful Hits - -```sql -SELECT COUNT(DISTINCT identifier) as identifiers, COUNT(DISTINCT initial_url) as uris, COUNT(DISTINCT final_sha1) as unique_sha1 FROM crawl_result WHERE hit=1; -``` - -De-duplication percentage (aka, fraction of hits where content had been crawled and identified previously): - -```sql -# AVG() hack! -SELECT 100. * AVG(final_was_dedupe) as percent FROM crawl_result WHERE hit=1; -``` - -Top mimetypes for successful hits (these are usually filtered to a fixed list in post-processing): - -```sql -SELECT final_mimetype, COUNT(*) FROM crawl_result WHERE hit=1 GROUP BY final_mimetype ORDER BY COUNT(*) DESC LIMIT 10; -``` - -Most popular breadcrumbs (a measure of how hard the crawler had to work): - -```sql -SELECT breadcrumbs, COUNT(*) FROM crawl_result WHERE hit=1 GROUP BY breadcrumbs ORDER BY COUNT(*) DESC LIMIT 10; -``` - -FTP vs. HTTP hits (200 is HTTP, 226 is FTP): - -```sql -SELECT final_status_code, COUNT(*) FROM crawl_result WHERE hit=1 GROUP BY final_status_code LIMIT 10; -``` - -### Domain Summary - -Top *initial* domains: - -```sql -SELECT initial_domain, COUNT(*), 100. * COUNT(*) / (SELECT COUNT(*) FROM crawl_result) as percent FROM crawl_result GROUP BY initial_domain ORDER BY count(*) DESC LIMIT 20; -``` - -Top *successful, final* domains, where hits were found: - -```sql - -SELECT initial_domain, COUNT(*), 100. * COUNT(*) / (SELECT COUNT(*) FROM crawl_result WHERE hit=1) AS percent FROM crawl_result WHERE hit=1 GROUP BY initial_domain ORDER BY COUNT(*) DESC LIMIT 20; -``` - -Top *non-successful, final* domains where crawl paths terminated before a successful hit (but crawl did run): - -```sql -SELECT final_domain, COUNT(*) FROM crawl_result WHERE hit=0 AND final_status_code IS NOT NULL GROUP BY final_domain ORDER BY count(*) DESC LIMIT 20; -``` - -Top *uncrawled, initial* domains, where the crawl didn't even attempt to run: - -```sql -SELECT initial_domain, COUNT(*) FROM crawl_result WHERE hit=0 AND final_status_code IS NULL GROUP BY initial_domain ORDER BY count(*) DESC LIMIT 20; -``` - -Top *blocked, final* domains: - -```sql -SELECT final_domain, COUNT(*) FROM crawl_result WHERE hit=0 AND (final_status_code='-61' OR final_status_code='-2') GROUP BY final_domain ORDER BY count(*) DESC LIMIT 20; -``` - -Top *rate-limited, final* domains: - -```sql -SELECT final_domain, COUNT(*) FROM crawl_result WHERE hit=0 AND final_status_code='429' GROUP BY final_domain ORDER BY count(*) DESC LIMIT 20; -``` - -### Status Summary - -Top failure status codes: - -```sql - SELECT final_status_code, COUNT(*) FROM crawl_result WHERE hit=0 GROUP BY final_status_code ORDER BY count(*) DESC LIMIT 10; -``` - -### Example Results - -A handful of random success lines: - -```sql - SELECT identifier, initial_url, breadcrumbs, final_url, final_sha1, final_mimetype FROM crawl_result WHERE hit=1 ORDER BY random() LIMIT 10; -``` - -Handful of random non-success lines: - -```sql - SELECT identifier, initial_url, breadcrumbs, final_url, final_status_code, final_mimetype FROM crawl_result WHERE hit=0 ORDER BY random() LIMIT 25; -``` diff --git a/test.sqlite b/test.sqlite deleted file mode 100644 index a0435e6..0000000 Binary files a/test.sqlite and /dev/null differ diff --git a/test.tsv b/test.tsv deleted file mode 100644 index dbd9620..0000000 --- a/test.tsv +++ /dev/null @@ -1,10 +0,0 @@ -d6571a951347fb19c55a8d9d30b578e14b55be10 -f76971159f5c35b9c900eba23a757d47afd03fc9 -2fd5cc631fbac0cb6d737e357377ed482235487d -17a2bdb7ca5aff57b20bdb7b72e893fce00304a0 -d487d844f6b0403113f814cfd6669b5007a371a7 -516144dac67a47bf23c0c9fa8530e95e8093105d -35bb4705895240d3191e93601a16eb421bec850b -f16c0eb11deb87f6194df7930652432b483192bc -cfdeccc0a94df2e50316fbbd31b508eac14c9b15 -ba5d22f94bcf267a88d3f097d7b95f499c025c15 -- cgit v1.2.3