datetime('now')
2019-12-26 20:48:12
QUERY: SELECT datetime('now');
seq
name
file
0
main
/home/bnewbold/code/chocula/chocula.2019-12-26.sqlite
QUERY: PRAGMA database_list;
Top publishers by journal count:
publisher
COUNT(*)
43367
Elsevier
4060
Informa UK (Taylor & Francis)
3363
Springer-Verlag
2938
SAGE Publications
1437
Peter Lang International Academic Publishers
1357
Wiley (Blackwell Publishing)
1167
Wiley (John Wiley & Sons)
1083
Walter de Gruyter GmbH
629
Springer (Biomed Central Ltd.)
594
Cambridge University Press
580
Georg Thieme Verlag KG
534
Hindawi Limited
533
OMICS Publishing Group
504
JSTOR
482
Oxford University Press
481
Medknow Publications
474
Emerald (MCB UP )
470
De Gruyter Open Sp. z o.o.
451
Inderscience Enterprises Ltd
448
Bentham Science
435
CAIRN
415
Institute of Electrical and Electronics Engineers
395
Brill
374
OpenEdition
374
QUERY: SELECT publisher, COUNT(*)
FROM journal
GROUP BY publisher
ORDER BY COUNT(*) DESC
LIMIT 25;
Top countries by number of journals:
country
COUNT(*)
106970
us
7547
gb
6411
nl
2497
de
2026
id
1645
br
1569
no
1079
es
1069
pl
898
QUERY: SELECT country,
COUNT(*)
FROM journal
GROUP BY country
ORDER BY COUNT(*) DESC
LIMIT 10;
.. by number of papers:
country
COUNT(*)
SUM(release_count)
106970
34471001
us
7547
22313191
gb
6411
13028344
nl
2497
8089964
de
2026
4391511
ch
698
1320180
jp
627
1311086
fr
843
905916
br
1569
683462
ca
618
526316
QUERY: SELECT country,
COUNT(*),
SUM(release_count)
FROM journal
GROUP BY country
ORDER BY SUM(release_count) DESC
LIMIT 10;
Top languages by number of journals:
lang
COUNT(*)
115227
en
25571
es
738
id
587
pt
560
de
504
fr
420
ja
330
ru
245
it
201
QUERY: SELECT lang,
COUNT(*)
FROM journal
GROUP BY lang
ORDER BY COUNT(*) DESC
LIMIT 10;
... by number of papers:
lang
COUNT(*)
SUM(release_count)
en
25571
48034446
115227
41589472
de
504
1061813
ja
330
567233
fr
420
327575
ru
245
166286
es
738
119675
pt
560
116237
it
201
98440
zh
62
45780
QUERY: SELECT lang,
COUNT(*),
SUM(release_count)
FROM journal
GROUP BY lang
ORDER BY SUM(release_count) DESC
LIMIT 10;
Fulltext coverage by publisher type:
publisher_type
AVG(ia_frac)
AVG(preserved_frac)
journal_count
paper_count
big5
0.13790374942184175
0.777494354837538
15643
40212427
society
0.26629892659791143
0.4477543353941486
10588
18162939
0.18932165820389008
0.24842923113768445
64146
14187155
unipress
0.33285972356984356
0.5661068841349259
7549
6200069
commercial
0.22752849270250103
0.6925074605186571
6796
5971082
longtail
0.45711865781170435
0.496898741198817
35197
3147895
oa
0.5556916593868192
0.7334655691426084
2581
1343062
repository
0.03318653234900813
0.1597687993504964
751
993044
other
0.1393788759995973
0.6215256674027421
939
862821
archive
0.31004617042048294
0.9824231978239752
597
765813
scielo
0.7876453723874977
0.8017765242492674
405
432115
QUERY: SELECT publisher_type,
AVG(ia_frac),
AVG(preserved_frac),
COUNT(*) AS journal_count,
SUM(release_count) AS paper_count
FROM journal
GROUP BY publisher_type
ORDER BY SUM(release_count) DESC;
Top publishers with very little coverage:
publisher
journal_count
AVG(ia_frac)
12489
0.0007462422716968573
Informa UK (Taylor & Francis)
2159
0.018535223377689123
Elsevier
2087
0.015386849708409962
Springer-Verlag
854
0.017150697644092168
SAGE Publications
834
0.01744322992299731
Wiley (Blackwell Publishing)
650
0.019927080180958748
Wiley (John Wiley & Sons)
647
0.01600965663955534
CAIRN
363
0.008910075730899723
Medknow Publications
313
0.007785659242349944
JSTOR
285
0.004924726114898372
QUERY: SELECT publisher,
COUNT(*) AS journal_count,
AVG(ia_frac)
FROM journal
WHERE ia_frac < 0.05
GROUP BY publisher
ORDER BY journal_count DESC
LIMIT 10;
Amount of fulltext by SHERPA/ROMEO journal color::
sherpa_color
SUM(ia_count)
5133222
blue
804197
green
7940396
white
484933
yellow
1805028
QUERY: SELECT sherpa_color,
SUM(ia_count)
FROM journal
GROUP BY sherpa_color;
Homepage URL counts:
unique_urls
journals_with_hompages
115819
77673
QUERY: SELECT COUNT(DISTINCT surt) as unique_urls, COUNT(DISTINCT issnl) as journals_with_hompages FROM homepage;
Journals with the most homepage URLs:
issnl
COUNT(*)
0717-554X
9
0185-2574
8
0328-1205
8
0328-3445
8
0379-8682
8
0717-5906
8
1246-7405
8
1415-6555
8
1641-876X
8
1669-2381
8
QUERY: SELECT issnl,
COUNT(*)
FROM homepage
GROUP BY issnl
ORDER BY COUNT(*) DESC
LIMIT 10;
Top/redundant URLs and SURTs:
surt
COUNT(*)
org,rsc,pubs)/en/ebooks
47
com,benjamins)/
29
ro,ubbcluj,studia)/serii/index_en.html
22
pl,czest,ajd,bg,kernel)/wydawnictwo.php
17
id,ac,unimed,jurnal)/
12
pl,edu,uwm,wydawnictwo)/artykul/14/czytelnia.html
12
pl,krakow,up,pbc)/dlibra/pubindex?dirids=5
11
org,ecorfan)/bolivia/research_journals.php
10
it,minervamedica)/index2.t
9
kr,or,koreascience)/journal/aboutjournal.jsp
8
QUERY: SELECT surt,
COUNT(*)
FROM homepage
GROUP BY surt
ORDER BY COUNT(*) DESC
LIMIT 10;
What is the deal with all those "benjamins" URLs?
publisher
name
John Benjamins Publishing Company
NOWELE
Studia Uralo-Altaica
John Benjamins Publishing Company
Language Problems and Language Planning
John Benjamins Publishing Company
Lingvisticæ investigationes
John Benjamins Publishing Company
Linguistics of the TIbeto-Burman Area
John Benjamins Publishing Company
Pragmatics & Cognition
John Benjamins Publishing Company
Terminology
John Benjamins Publishing Company
Written Language & Literacy
John Benjamins Publishing Company
FORUM: Revue internationale d?interprétation et de traduction / International Journal of Interpretation and Translation
John Benjamins Publishing Company
English Text Construction
John Benjamins Publishing Company
Constructions and Frames
John Benjamins Publishing Company
Pragmatics and Society
John Benjamins Publishing Company
Translation and Interpreting Studies
John Benjamins Publishing Company
Language and Dialogue
John Benjamins Publishing Company
Metaphor in Language, Cognition, and Communication
Hamburg Studies on Linguistic Diversity
John Benjamins Publishing Company
Translation Spaces
Studies in Arabic Linguistics
John Benjamins Publishing Company
Journal of Immersion and Content-Based Language Education (JICB)
Children's Literature, Culture, and Cognition
John Benjamins Publishing Company
Journal of Language Aggression and Conflict
FILLM Studies in Languages and Literatures
Advances in Historical Sociolinguistics
John Benjamins Publishing Company
Linguistic Landscape
John Benjamins Publishing Company
International Journal of Learner Corpus Research
John Benjamins Publishing Company
Journal of Second Language Pronunciation
ITL - International Journal of Applied Linguistics
John Benjamins Publishing Company
Cognitive Individual Differences in Second Language Processing and Acquisition
John Benjamins Publishing Company
Studies in Germanic Linguistics
QUERY: SELECT publisher,
name
FROM journal
LEFT JOIN homepage ON journal.issnl = homepage.issnl
WHERE homepage.surt = 'com,benjamins)/';
Domains that block us:
domain
journal_homepages
SUM(blocked)
jstor.org
3241
3241
wiley.com
2503
229
brill.nl
222
164
bentham.org
149
149
emeraldgrouppublishing.com
76
76
uem.br
39
34
emeraldinsight.com
401
17
rodopi.nl
19
15
esaunggul.ac.id
11
11
ingentaconnect.com
120
9
ucb.br
11
9
univie.ac.at
45
9
erlbaum.com
9
8
iaster.com
7
7
ucla.edu
32
7
uctjournals.com
7
7
elsevier.com
2861
6
inah.gob.mx
7
5
medicaljournals.se
5
5
mohr.de
13
5
QUERY: SELECT domain,
COUNT(*) as journal_homepages,
SUM(blocked)
FROM homepage
GROUP BY domain
ORDER BY SUM(blocked) DESC
LIMIT 20;
Top duplicated domains:
QUERY: SELECT url,
COUNT(*)
FROM homepage
GROUP BY url
ORDER BY COUNT(*) DESC
LIMIT 20;
Number of journals with a homepage that points to web.archive.org or archive.org:
COUNT(DISTINCT issnl)
164
QUERY: SELECT COUNT(DISTINCT issnl)
FROM homepage
WHERE domain = 'archive.org';
Top publishers that have journals in wayback:
publisher
COUNT(*)
39
EDP Sciences
11
PERSEE Program
3
CAIRN
2
Fabula
2
Institut du monde et du développement pour la bonne gouvernance publique
2
ANPAD
1
ANR le Saint-Simonisme 18-21
1
Ad hoc (Rennes)
1
Al Manhal FZ, LLC
1
QUERY: SELECT publisher,
COUNT(*)
FROM journal
LEFT JOIN homepage ON journal.issnl = homepage.issnl
WHERE homepage.domain = 'archive.org'
GROUP BY journal.publisher
ORDER BY COUNT(*) DESC
LIMIT 10;
Top publishers by number of journals missing a homepage:
publisher
COUNT(*)
34205
Peter Lang International Academic Publishers
1309
Elsevier
1036
Informa UK (Taylor & Francis)
650
Springer-Verlag
465
OMICS Publishing Group
413
Georg Thieme Verlag KG
357
Wiley (John Wiley & Sons)
330
Egypts Presidential Specialized Council for Education and Scientific Research
266
Science Publishing Group
260
SAGE Publications
256
Al Manhal FZ, LLC
250
Bentham Science
214
Wiley (Blackwell Publishing)
212
Medknow Publications
203
Inderscience Enterprises Ltd
170
African Journals Online
166
Diva Enterprises Private Limited
166
PERSEE Program
142
Scientific Research Publishing, Inc
139
QUERY: SELECT publisher,
COUNT(*)
FROM journal
WHERE any_homepage=0
GROUP BY publisher
ORDER BY COUNT(*) DESC
LIMIT 20;