Please use this identifier to cite or link to this item:
Title: About the dark corners in the gene function space of Escherichia coli remaining without illumination by scientific literature
Authors: Tantoso, Erwin
Eisenhaber, Birgit
Sinha, Swati
Jensen, Lars Juhl
Eisenhaber, Frank
Keywords: Science::Biological sciences
Issue Date: 2023
Source: Tantoso, E., Eisenhaber, B., Sinha, S., Jensen, L. J. & Eisenhaber, F. (2023). About the dark corners in the gene function space of Escherichia coli remaining without illumination by scientific literature. Biology Direct, 18(1), 7-.
Journal: Biology Direct 
Abstract: Background: Although Escherichia coli (E. coli) is the most studied prokaryote organism in the history of life sciences, many molecular mechanisms and gene functions encoded in its genome remain to be discovered. This work aims at quantifying the illumination of the E. coli gene function space by the scientific literature and how close we are towards the goal of a complete list of E. coli gene functions. Results: The scientific literature about E. coli protein-coding genes has been mapped onto the genome via the mentioning of names for genomic regions in scientific articles both for the case of the strain K-12 MG1655 as well as for the 95%-threshold softcore genome of 1324 E. coli strains with known complete genome. The article match was quantified with the ratio of a given gene name’s occurrence to the mentioning of any gene names in the paper. The various genome regions have an extremely uneven literature coverage. A group of elite genes with ≥ 100 full publication equivalents (FPEs, FPE = 1 is an idealized publication devoted to just a single gene) attracts the lion share of the papers. For K-12, ~ 65% of the literature covers just 342 elite genes; for the softcore genome, ~ 68% of the FPEs is about only 342 elite gene families (GFs). We also find that most genes/GFs have at least one mentioning in a dedicated scientific article (with the exception of at least 137 protein-coding transcripts for K-12 and 26 GFs from the softcore genome). Whereas the literature growth rates were highest for uncharacterized or understudied genes until 2005–2010 compared with other groups of genes, they became negative thereafter. At the same time, literature for anyhow well-studied genes started to grow explosively with threshold T10 (≥ 10 FPEs). Typically, a body of ~ 20 actual articles generated over ~ 15 years of research effort was necessary to reach T10. Lineage-specific co-occurrence analysis of genes belonging to the accessory genome of E. coli together with genomic co-localization and sequence-analytic exploration hints previously completely uncharacterized genes yahV and yddL being associated with osmotic stress response/motility mechanisms. Conclusion: If the numbers of scientific articles about uncharacterized and understudied genes remain at least at present levels, full gene function lists for the strain K-12 MG1655 and the E. coli softcore genome are in reach within the next 25–30 years. Once the literature body for a gene crosses 10 FPEs, most of the critical fundamental research risk appears overcome and steady incremental research becomes possible.
ISSN: 1745-6150
DOI: 10.1186/s13062-023-00362-0
Schools: School of Biological Sciences 
Organisations: Genome Institute of Singapore, A*STAR 
Bioinformatics Institute, A*STAR 
Rights: © The Author(s) 2023. Open Access. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver (http://creativeco applies to the data made available in this article, unless otherwise stated in a credit line to the data.
Fulltext Permission: open
Fulltext Availability: With Fulltext
Appears in Collections:SBS Journal Articles

Files in This Item:
File Description SizeFormat 
s13062-023-00362-0.pdf2.42 MBAdobe PDFThumbnail

Citations 50

Updated on Nov 26, 2023

Web of ScienceTM
Citations 50

Updated on Oct 31, 2023

Page view(s)

Updated on Nov 28, 2023


Updated on Nov 28, 2023

Google ScholarTM




Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.