aboutsummaryrefslogtreecommitdiffstats
path: root/notes/2021_10_27_doaj_subgraph.md
blob: b750d0ccf0c7e6202ebf4f7c59c3adf0182d5bb9 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
# DOAJ Citation Graph

Based on Refcat (v1), the Internet Archive (IA) Scholar Citation Graph.

> 2021-10-27

## Basic numbers

We started with a set of 4,887,241 DOI from DOAJ, after normalization we find
4,773,245 metadata records in https://fatcat.wiki (catalog).

|                                                                   | count         |
|---------------------------------------------------------------    |-------------  |
| matched edges                                                     | 124,760,397   |
| matched edges (by identifier)                                     | 118,314,316   |
| matched edges (by fuzzy matching)                                 | 6,446,081     |
| citations **from** a DOAJ document                                | 98,616,033    |
| citations having a DOAJ document as **target**                    | 34,910,769    |
| citation where source and target are in DOAJ (**intra-DOAJ**)     | 8,766,405     |
| unique source documents (all)                                     | 12,730,677    |
| unique source documents (doaj)                                    | 3,471,878     |
| unique target documents (all)                                     | 24,331,406    |
| unique target documents (doaj)                                    | 2,678,972     |

In words:

For 72% of DOAJ documents, we have recorded at least one reference to a target
and for 56% of the DOAJ documents, we have record at least one citation
pointing to it.

About 7% of the citation we find are intra-DOAJ, that is both the citing and
the cited article is in DOAJ.

## Charts

Top referenced articles in this dataset are:

| Cited By  | Fatcat Release Identifier     | Title                                                                                                                                                     |
|---------- |----------------------------   |---------------------------------------------------------------------------------------------------------------------------------------------------------  |
| 27043     | pedretid7rd6xknd6gsrrh3wum    | A short history ofSHELX                                                                                                                                   |
| 26974     | hzhcy7rsoravrilgyhzohwlmai    | Preferred Reporting Items for Systematic Reviews and Meta-Analyses: The PRISMA Statement                                                                  |
| 22543     | fiqrt3cc5jgupls3fvroghzb4y    | Fitting Linear Mixed-Effects Models Usinglme4                                                                                                             |
| 19735     | 4dxke54hnjh4nmsjbrrlu2o5zq    | Self-efficacy: Toward a unifying theory of behavioral change.                                                                                             |
| 17670     | 3zmp4orkdff7tk3tc3q7hvyvay    | Analysis of Relative Gene Expression Data Using Real-Time Quantitative PCR and the 2−ΔΔCT Method                                                          |
| 16186     | bdsantixljesjkofonh3oqalzq    | The Achromatic Interfero Coronagraph                                                                                                                      |
| 8758      | jubvkngt7zflbfkwsff44fxa6q    | BEAST: Bayesian evolutionary analysis by sampling trees                                                                                                   |
| 8713      | ztl7z2e3engvtad4l5qhldmd64    | Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries                                    |
| 8646      | ctdiwqadirftjgu77untvwbpiu    | A rapid and sensitive method for the quantitation of microgram quantities of protein utilizing the principle of protein-dye binding                       |
| 8195      | 5dcgafogfvg4tfqqhobidybpna    | Basic local alignment search tool                                                                                                                         |
| 7741      | 27tkrqbmjrfctnhmodskvwhhqa    | RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome                                                             |
| 7488      | fyhpfh5lkjgl7ewr7pcgrzekha    | Structure validation in chemical crystallography                                                                                                          |
| 7266      | tdsusrfiuzcqxnnlbmm6uzyh4m    | The PRISMA Statement for Reporting Systematic Reviews and Meta-Analyses of Studies That Evaluate Health Care Interventions: Explanation and Elaboration   |
| 7242      | qhqpojpbuvh4zffs4dvqs4beyi    | BLAST+: architecture and applications                                                                                                                     |
| 7085      | joktmyyu5vdv3kuxm42zzqhn3e    | Hallmarks of Cancer: The Next Generation                                                                                                                  |
| 6934      | xku5g3hmm5eangsczpzjrctd7e    | Gapped BLAST and PSI-BLAST: a new generation of protein database search programs                                                                          |
| 6806      | srzvnzj7rvbbhig37uw6vh6m4u    | The Sequence Alignment/Map format and SAMtools                                                                                                            |
| 6685      | tgwxkq5jnjfc3eu3zpycilq7xm    | Using thematic analysis in psychology                                                                                                                     |
| 6554      | 5g42373tjfecxp44yqns7qwzoe    | The RAST Server: Rapid Annotations using Subsystems Technology                                                                                            |
| 6489      | wbkhvqxm2napppgmaxin66upgm    | WGCNA: an R package for weighted correlation network analysis                                                                                             |
| 6215      | atq75qnkkzdadbhaslevbmdlaq    | Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2                                                                           |
| 6192      | jhoeu43y7rhoxd5eaw3dqzc4tm    | Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing                                                               |
| 6124      | pebeuwozure4xiaygfs6om4fya    | Arlequin (version 3.0): An integrated software package for population genetics data analysis                                                              |
| 5900      | ym7irtp4dveurpinpuyfjjdyuu    | FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments                                                                                  |
| 5894      | dy3dpacbd5a6dag42nnsjh3pte    | Fast gapped-read alignment with Bowtie 2                                                                                                                  |
| 5891      | nm4tov3wxndjjjpnyoqe5lirom    | MUSCLE: multiple sequence alignment with high accuracy and high throughput                                                                                |
| 5861      | tcwbgpm3kfbnxk3lhwgsaswmrm    | Trimmomatic: a flexible trimmer for Illumina sequence data                                                                                                |
| 5853      | j5bjclahkjfxtm6px3germagpm    | MEGA6: Molecular Evolutionary Genetics Analysis Version 6.0                                                                                               |
| 5818      | nttk476glncuhbuy4vvskrwfoi    | Projections of Global Mortality and Burden of Disease from 2002 to 2030                                                                                   |
| 5644      | 7bsqead3n5he3gmbzkmfetdj3e    | MEGA7: Molecular Evolutionary Genetics Analysis Version 7.0 for Bigger Datasets                                                                           |

Top most referenced articles belonging to DOAJ:

| Cited By  | Fatcat Release Identifier     | Title                                                                                                                                                                                                     |
|---------- |----------------------------   |-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------    |
| 257       | 42lwecjh4nhjbbfx5j6feoy4re    | Evidence for large domains of similarly expressed genes in the <it>Drosophila </it> genome                                                                                                                |
| 254       | ns4v2jvhgbhh7mbg45bjtpzway    | A new natural hybrid of Iris (Iridaceae) from Chongqing, China                                                                                                                                            |
| 220       | yqhzw62yhbd4xnfm6qkplk5gky    | Three new subterranean species of Baezia (Curculionidae, Molytinae) for the Canary Islands                                                                                                                |
| 206       | fbr3cmn7svdyrk2de74p4ibhra    | Dwarfs of the fortress: A new cryptic species of dwarf gecko of the genus Cnemaspis Strauch, 1887 (Squamata, Gekkonidae) from Rajgad fort in the northern Western Ghats of Maharashtra, India             |
| 187       | vwbvqztj7zbznhejreb6nmkghq    | A role for <it>cryptochromes</it> in sleep regulation                                                                                                                                                     |
| 187       | pv7gwyji7nbh7et776cnmdok3a    | A new species of day gecko (Reptilia, Gekkonidae, Cnemaspis Strauch, 1887) from Sri Lanka with an updated ND2 gene phylogeny of Sri Lankan and Indian species                                             |
| 164       | x3ahxq56c5bwrlf6pkmpmohqmm    | The laboratory rat: Relating its age with human′s                                                                                                                                                         |
| 162       | 75iqitudtbcrxn73dvwv7vka5m    | On the Generalized Distance in Statistics                                                                                                                                                                 |
| 157       | rdk724wf75ddhc5qszf53jytuy    | Immunocytochemical evidence for co-expression of Type III IP<sub>3</sub> receptor with signaling components of bitter taste transduction                                                                  |
| 142       | dlnghkvx7bgotfftqvp5rsgeg4    | Reactivation of a silenced <it>H19</it> gene in human rhabdomyosarcoma by demethylation of DNA but not by histone hyperacetylation                                                                        |
| 126       | evrrqdpegnhvpggmnxtdxjdnou    | Frequent Promoter Methylation of <it>CDH1, DAPK, RARB</it>, and <it>HIC1 </it>Genes in Carcinoma of Cervix Uteri: Its Relationship to Clinical Outcome                                                    |
| 122       | p43ke27vpff6lcakjy4zchczhy    | A tandem repeats database for bacterial genomes: application to the genotyping of <it>Yersinia pestis</it> and <it>Bacillus anthracis</it>                                                                |
| 119       | 4vipha52brfmpk5ydwb2tqbxh4    | Dividend Policy Growth and the Valuation of Shares                                                                                                                                                        |
| 117       | oj66fyr4nncipn4rmc77px7q2y    | PGC-1alpha Deficiency Causes Multi-System Energy Metabolic Derangements: Muscle Dysfunction, Abnormal Weight Control and Hepatic Steatosis                                                                |
| 114       | fdeqimfgg5ac7e6tqeov3lnkb4    | The molecular genetic linkage map of the model legume <it>Medicago truncatula</it>: an essential tool for comparative legume genomics and the isolation of agronomically important genes                  |
| 112       | i4rp4yjw3bd6taihp3gkvjln2a    | Aprendendo a entrevistar: como fazer entrevistas em Ciências Sociais                                                                                                                                      |
| 111       | ix3qnhyhovbwxiwycgcqofdrje    | Malarone treatment failure and <it>in vitro</it> confirmation of resistance of <it>Plasmodium falciparum</it> isolate from Lagos, Nigeria                                                                 |
| 110       | yd7hojmywvexrpcoyql2bnlhyi    | OPERATIONAL EARTHQUAKE FORECASTING. State of Knowledge and Guidelines for Utilization                                                                                                                     |
| 101       | uqjudwtgjngbtpr3ey3fog3roa    | Italian Privileges and Trade in Byzantium before the Fourth Crusade: A Reconsideration                                                                                                                    |
| 101       | 6b6rdxf6fve6rj6pc7a7er4mfe    | Speciation and phylogeography in the cosmopolitan marine moon jelly, <it>Aurelia</it> sp                                                                                                                  |
| 100       | njdobqruvzgabdzbifrbtfnhye    | The Comparative RNA Web (CRW) Site: an online database of comparative sequence and structure information for ribosomal, intron, and other RNAs: Correction                                                |
| 98        | dovwbef4crar5dvpbgbsnmjegu    | Antispilina ludwigi Hering, 1941 (Lepidoptera, Heliozelidae) a rare but overlooked European leaf miner of Bistorta officinalis (Polygonaceae): new records, redescription, biology and conservation       |
| 97        | tlewnaq64zdclbzdva7vd3pjy4    | Beyond Empathy. Phenomenological Approaches to Intersubjectivity                                                                                                                                          |
| 95        | 4xwg6e5qpnfuxmobn3ntj4beyi    | Knowledge and attitude toward COVID-19 among healthcare workers at District 2 Hospital, Ho Chi Minh City                                                                                                  |
| 93        | 5ajqrpxqdjdzhahzkpqcqyp754    | HPLC-DAD-ESI-MSn identification of phenolic compounds in cultivated strawberries from Macedonia                                                                                                           |
| 91        | nofyxyrfcjclhayeeymhtyaeia    | Biofilm formation by nontypeable <it>Haemophilus influenzae:</it> strain variability, outer membrane antigen expression and role of pili                                                                  |
| 86        | tgrf2rfdjvhv3h2j55gydzjwiu    | Labiobaetis                       Novikova & Kluge in West Africa (Ephemeroptera, Baetidae), with description of a new species                                                                            |
| 79        | z33rdxu3cnh65lbaze44nfi6cm    | Molecular phylogeny of Subtribe Artemisiinae (Asteraceae), including <it>Artemisia</it> and its allied and segregate genera                                                                               |
| 79        | r2acgmnjlfcpjalpsaw6srcq5y    | Haplotype analysis of the PPARγ Pro12Ala and C1431T variants reveals opposing associations with body weight                                                                                               |

## Glossary

### Edge

An edge connect a source metadata document with a target metadata document
(from the fatcat catalog) and records a certain or highly likely citation of
target document in source document.

We also record (and display) unmatched references, that is reference
information from a source, that has not been matched to a target yet. These are
called "unmatched refs", sometimes.

### Fatcat.wiki

The catalog underlying Internet Archive Scholar

### Internet Archive Scholar

Search engine over 100M metadata and over 30M fulltext documents, updated in
near real-time as new metadata and fulltext document become available in
fatcat.

### Internet Archive Scholar Citation Graph

A citation graph derived from scholarly metadata and fulltext documents curated
at the Internet Archive. Version 1 has been released in 10/2021. Futher information can be found here:

* https://guide.fatcat.wiki/reference_graph.html
* https://blog.archive.org/2021/10/19/internet-archive-releases-refcat-the-ia-scholar-index-of-over-1-3-billion-scholarly-citations/
* https://arxiv.org/abs/2110.06595