Fatcat "Chocula" Journal Metadata Summary

This report is auto-generated from a sqlite database file, which should be available/included.

datetime('now')
2019-08-01 03:55:43
QUERY: SELECT datetime('now');

Note that pretty much all of the fatcat release stats are on a release, not work basis, so there may be over-counting. Also, as of July 2019 there were over 1.5 million OA longtail releases which are not linked to a container (journal).

Basics

Top countries by journal count (and fatcat release counts):

country journal_count sum(release_count)
91931 34853365
us 6838 20812424
gb 5967 12238711
nl 2343 7763639
de 1841 4176386
id 1562 112525
br 1501 614272
es 1012 275328
pl 807 256632
it 803 304793
QUERY: SELECT country, COUNT(*) AS journal_count, sum(release_count) from journal group by country order by count(*) desc limit 10;

Top languages by journal count (and fatcat release counts):

lang journal_count release_count
96856 39729766
en 25584 46389136
es 738 105717
id 587 35909
pt 560 99100
de 504 1050664
fr 420 314582
ja 330 589020
ru 245 150367
it 202 97561
QUERY: SELECT lang, COUNT(*) as journal_count, sum(release_count) as release_count FROM journal GROUP BY lang ORDER BY COUNT(*) DESC LIMIT 10;

Aggregate fatcat fulltext release coverage by OA status:

is_oa journal_count SUM(release_count) SUM(ia_count) total_ia_frac
0 74985 42533886 5678063 0.13
1 51850 46171246 10357705 0.22
QUERY: SELECT is_oa, COUNT(*) AS journal_count, SUM(release_count), SUM(ia_count), ROUND(1. * SUM(ia_count) / SUM(release_count), 2) as total_ia_frac FROM journal GROUP BY is_oa;

Publisher Segmentation

Big publishers by journal count:

publisher journal_count SUM(release_count)
47203 6504661
Elsevier 4009 16206074
Informa UK (Taylor & Francis) 3332 3921600
Springer-Verlag 2875 5638303
SAGE Publications 1372 2344281
Peter Lang International Academic Publishers 1360 252
Wiley (Blackwell Publishing) 1173 3640989
Wiley (John Wiley & Sons) 1039 4456867
Walter de Gruyter GmbH 624 435616
Springer (Biomed Central Ltd.) 558 450187
Cambridge University Press 553 1519555
Hindawi Limited 521 194707
Georg Thieme Verlag KG 512 688731
OMICS Publishing Group 502 93785
JSTOR 495 738890
QUERY: SELECT publisher, COUNT(*) AS journal_count, SUM(release_count) from journal GROUP BY publisher ORDER BY COUNT(*) DESC LIMIT 15;

Number of publishers with 3 or fewer journals:

COUNT(*)
18307
QUERY: SELECT COUNT(*) FROM (SELECT publisher, COUNT(*) as journal_count FROM journal GROUP BY publisher) WHERE journal_count <= 3;

Fulltext coverage by publisher type:

publisher_type ia_total_frac preserved_total_frac journal_count paper_count
big5 0.12 0.89 15362 39334593
society 0.25 0.71 8545 17499721
0.15 0.28 59716 13550129
commercial 0.12 0.84 6608 6041845
unipress 0.24 0.84 6017 5876070
longtail 0.48 0.54 25523 2216976
oa 0.76 0.84 2476 1180835
repository 0.13 0.3 646 925092
other 0.08 0.88 927 861701
archive 0.26 0.98 604 792273
scielo 0.8 0.81 411 425897
QUERY: SELECT publisher_type, ROUND(1.0 * SUM(ia_count) / SUM(release_count), 2) as ia_total_frac, ROUND(1.0 * SUM(preserved_count) / SUM(release_count), 2) as preserved_total_frac, count(*) as journal_count, sum(release_count) as paper_count from journal group by publisher_type order by sum(release_count) desc;

Fulltext coverage by publisher type (NOTE: averaging fractions without weighing by release count, intentionally):

publisher_type avg_ia_frac avg_preserved_frac journal_count paper_count
big5 0.15 0.81 15362 39334593
society 0.32 0.53 8545 17499721
0.24 0.31 59716 13550129
commercial 0.26 0.76 6608 6041845
unipress 0.42 0.69 6017 5876070
longtail 0.55 0.58 25523 2216976
oa 0.63 0.8 2476 1180835
repository 0.04 0.19 646 925092
other 0.15 0.65 927 861701
archive 0.31 0.98 604 792273
scielo 0.83 0.85 411 425897
QUERY: SELECT publisher_type, ROUND(1.0 * AVG(ia_frac), 2) as avg_ia_frac, ROUND(1.0 * AVG(preserved_frac), 2) as avg_preserved_frac, count(*) as journal_count, sum(release_count) as paper_count from journal group by publisher_type order by sum(release_count) DESC;

Number of journals with no releases (metadata or fulltext) in fatcat:

publisher_type journals_with_no_releases
21195
longtail 13194
society 1770
commercial 1626
unipress 1521
big5 363
oa 85
archive 45
other 17
repository 10
QUERY: SELECT publisher_type, COUNT(*) AS journals_with_no_releases FROM journal WHERE release_count = 0 GROUP BY publisher_type ORDER BY COUNT(*) DESC;

IA Fulltext Coverage

Coverage by sherpa color:

sherpa_color ia_fulltext_count release_count total_ia_frac
5076210 26342117 0.19
blue 799381 2876891 0.28
green 7873174 41516320 0.19
white 483434 4632715 0.1
yellow 1803569 13337089 0.14
QUERY: SELECT sherpa_color, SUM(ia_count) as ia_fulltext_count, SUM(release_count) as release_count, ROUND(1.0 * SUM(ia_count) / SUM(release_count), 2) as total_ia_frac FROM journal GROUP BY sherpa_color;

Top publishers with very little IA coverage (NOTE: averaging fractions without weight by journal size):

publisher journal_count ROUND(avg(ia_frac),3)
13226 0.001
Informa UK (Taylor & Francis) 2081 0.019
Elsevier 2053 0.015
SAGE Publications 764 0.018
Springer-Verlag 761 0.018
Wiley (Blackwell Publishing) 641 0.02
Wiley (John Wiley & Sons) 593 0.017
JSTOR 295 0.005
CAIRN 280 0.012
Medknow Publications 280 0.008
QUERY: SELECT publisher, count(*) as journal_count, ROUND(avg(ia_frac),3) from journal where ia_frac < 0.05 group by publisher order by count(*) desc limit 10;

Homepages

Journal counts by homepage status:

any_homepage any_live_homepage any_gwb_homepage COUNT(*) frac
0 0 0 65614 0.52
1 0 0 5434 0.04
1 0 1 4843 0.04
1 1 0 3624 0.03
1 1 1 47320 0.37
QUERY: SELECT any_homepage, any_live_homepage, any_gwb_homepage, COUNT(*), ROUND(1.0 * COUNT(*) / (SELECT COUNT(*) FROM journal), 2) AS frac FROM journal GROUP BY any_homepage, any_live_homepage, any_gwb_homepage;

Number of unique journals that have a homepage pointing to wayback or archive.org:

COUNT(DISTINCT issnl)
154
QUERY: SELECT COUNT(DISTINCT issnl) FROM homepage WHERE domain = 'archive.org';

Top publishers that have journals in wayback:

publisher COUNT(*)
63
EDP Sciences 11
PERSEE Program 3
CAIRN 2
Fabula 2
Institut du monde et du développement pour la bonne gouvernance publique 2
ANPAD 1
Ad hoc (Rennes) 1
Asociación Revista Venezolana de Ciencia y Tecnología de Alimentos 1
Association Epiga 1
QUERY: SELECT publisher, COUNT(*) FROM journal LEFT JOIN homepage ON journal.issnl = homepage.issnl WHERE homepage.domain = 'archive.org' GROUP BY journal.publisher ORDER BY COUNT(*) DESC LIMIT 10;

Homepage URL counts:

rows issnls surts
83909 61221 82678
QUERY: SELECT COUNT(*) as rows, COUNT(DISTINCT issnl) as issnls, COUNT(DISTINCT surt) as surts FROM homepage;

Journals with most unique SURTs:

issnl COUNT(*)
0717-3458 6
1406-4243 6
2190-5991 6
0011-6793 5
0022-9830 5
0091-6765 5
0102-7638 5
0144-8463 5
0212-6567 5
0350-154X 5
QUERY: SELECT issnl, COUNT(*) from homepage GROUP BY issnl ORDER BY COUNT(*) DESC LIMIT 10;

Blocked domains:

domain count(*) sum(blocked)
jstor.org 3235 3234
brill.nl 216 161
wiley.com 2372 152
bentham.org 146 146
tandfonline.com 2919 84
cairn.info 52 49
emeraldgrouppublishing.com 49 49
emeraldinsight.com 390 16
rodopi.nl 19 15
sagepub.com 1863 9
vsppub.com 9 9
iaster.com 7 7
mohr.de 15 7
scienceq.org 7 7
uctjournals.com 7 7
elsevier.com 2746 6
esaunggul.ac.id 6 6
bloomsbury.com 4 4
gov.hu 8 4
inap.es 4 4
QUERY: SELECT domain, count(*), sum(blocked) from homepage group by domain order by sum(blocked) desc limit 20;

Top duplicated URLs and SURTs:

url COUNT(*)
http://www.studia.ubbcluj.ro/serii/index_en.html 22
http://jurnal.unimed.ac.id/ 12
https://benjamins.com/ 10
http://www.ecorfan.org/bolivia/research_journals.php 9
http://www.minervamedica.it/index2.t 8
https://www.benjamins.com/ 8
http://gesundheitsfoerderung.ch/ueber-uns/downloads.html 6
http://www.inderscience.com/browse/index.php 6
http://www.iospress.nl/ 6
http://edizionicafoscari.unive.it/it/edizioni/collane/antichistica/ 5
QUERY: SELECT url, COUNT(*) FROM homepage GROUP BY url ORDER BY COUNT(*) DESC LIMIT 10;

Top terminal URLs catch cases where many URLs redirect to a single page:

terminal_url COUNT(DISTINCT issnl)
https://portal.eiu.com/Login.aspx?c=1 180
https://taylorandfrancis.com 151
https://onlinelibrary.wiley.com/ 131
http://eia.libis.lt/aboutProject.php 70
https://www.hindawi.com/journals/isrn/ 67
https://us.sagepub.com/en-us/nam/insights-journals 61
https://metapress.com 44
http://blackwell-science.com/ 41
https://staatsbibliothek-berlin.de/Emedien-Meldungen/Login-Hinweis/ 40
https://taylorandfrancis.com/ 37
https://pubs.rsc.org/en/ebooks 36
http://explore.tandfonline.com/page/ah/maney-publishing-journals 34
http://www.bentham.org/403.shtml 27
https://metapress.com/ 27
http://www.inderscience.com/browse/index.php 23
http://www.studia.ubbcluj.ro/serii/index_en.html 22
https://www.impresaitalia.info 20
https://www.tandfonline.com/ 20
https://benjamins.com/content/home 17
https://content-select.com/login 15
QUERY: SELECT terminal_url, COUNT(DISTINCT issnl) FROM homepage WHERE terminal_url IS NOT NULL GROUP BY terminal_url ORDER BY COUNT(DISTINCT issnl) DESC LIMIT 20;

surt COUNT(*)
org,rsc,pubs)/en/ebooks 47
com,benjamins)/ 27
ro,ubbcluj,studia)/serii/index_en.html 22
id,ac,unimed,jurnal)/ 12
org,ecorfan)/bolivia/research_journals.php 9
it,minervamedica)/index2.t 8
ch,gesundheitsfoerderung)/ueber-uns/downloads.html 6
com,inderscience)/browse/index.php 6
nl,iospress)/ 6
com,inderscience)/ 5
QUERY: SELECT surt, COUNT(*) FROM homepage GROUP BY surt ORDER BY COUNT(*) DESC LIMIT 10;