🧬

Molecular Biology Foundations

Core mechanisms of life at the molecular scale

A curated set of 10 distilled sources covering DNA replication, gene regulation, protein synthesis, cellular signaling, and modern techniques like CRISPR. Designed for professionals seeking rigorous, source-backed understanding of biological systems without medical or applied-health framing.

10 documents · sourced from Nobuyuki Ota / arXiv 2603.23361v2 · Sepehr Ehsani / Macro-trends in research on the central dogma of molecular biology / arXiv 1301.2397v2 · Semi-conservative DNA replication description (Perplexity web research) · Alexander S. Serov · Web research on pre-mRNA processing (Perplexity) · Olivier Dauloudet et al. / arXiv 2009.14533v2 · Dagmar Iber / A quantitative study of the benefits of co-regulation using the spoIIA operon as an example / q-bio/0607008v1 · Alejandro Saettone et al. · Oufan Zhang · A Note on Elementary Cellular Automata Classification

Install this pack — try MIND free →Open in MIND

What’s inside

Fundamental Building Blocks: DNA, RNA, and Proteins

Nobuyuki Ota / arXiv 2603.23361v2

Computational approaches are advancing predictions of RNA and protein structures central to molecular biology processes. One effort introduces a scoring function for RNA 3D structure prediction from sequence alone and analyzes its performance strengths and shortcomings against state-of-the-art techniques as a foundation for improved methods. Separate work reviews molecular dynamics simulations of chemically modified ribonucleotides that employ force fields, enhanced sampling, and alchemical conversions of wild-type nucleotides to modified forms, granting access to three-dimensional structural dynamics and protein-binding effects beyond secondary-structure impacts. An AI architecture termed CDT-III models the central dogma across DNA, RNA, and protein via a two-stage Virtual Cell Embedder separating nuclear transcription from cytosolic translation, attaining per-gene correlations of 0.843 for RNA and 0.969 for protein on five held-out genes while showing that protein supervision raises RNA performance from 0.804 to 0.843 and boosts CTCF enrichment by 30 percent at the DNA level; it also correctly predicts both mRNA and protein responses despite 66.7 percent of genes exhibiting opposite directional changes at absolute log2 fold-change above 0.01. A further protocol adapts an RNA stepwise ansatz to protein loop modeling inside Rosetta, recovering sub-Angstrom crystallographic accuracy for 19 of 20 loops in established benchmarks through residue-by-residue enumeration at costs of thousands of CPU-hours for 12-residue cases.

The Central Dogma of Molecular Biology

Sepehr Ehsani / Macro-trends in research on the central dogma of molecular biology / arXiv 1301.2397v2

The central dogma of molecular biology, as formulated by Francis Crick, states that genetic information flows in one specific irreversible direction from DNA to RNA to protein. Information is defined strictly as the precise monomer sequence of bases in DNA or RNA and amino acids in proteins, permitting transfers among nucleic acids or from nucleic acid to protein but never from protein back to nucleic acid. DNA functions as the stable repository that is replicated for inheritance across generations. Transcription copies one DNA strand into messenger RNA via RNA polymerase, producing a complementary base sequence that carries the gene information to ribosomes. Translation then reads this mRNA in codon triplets, each directing a specific amino acid delivered by transfer RNA; the ribosome assembles these into a polypeptide whose linear order follows the genetic code and ultimately dictates protein structure and cellular functions. The supplied web research confirms this colinear, directional mapping with no reverse flow, while arXiv 1705.09868v1 situates the process within broader biological flows of materials, energy, and information, and arXiv 1301.2397v2 documents the historical dominance of protein-centric studies that later converged with DNA-focused work. arXiv 2508.04085v1 hypothesizes an evolutionary origin through spontaneous symmetry breaking that separates information transmission by nucleic acids from expression by proteins, offering a framework for analogous divisions observed across biological scales.

Mechanisms of DNA Replication

Semi-conservative DNA replication description (Perplexity web research)

Semi-conservative DNA replication produces daughter duplexes each containing one parental strand and one newly synthesized strand because the two parental strands separate and each acts as template. Origin recognition proteins recruit helicase, which unwinds the helix to create the replication fork and expose single-stranded templates. Topoisomerase relieves torsional strain ahead of the fork while single-strand binding proteins stabilize the exposed DNA. Primase synthesizes short RNA primers that supply the 3′-OH required by DNA polymerase. DNA polymerase then extends chains exclusively in the 5′ to 3′ direction by catalyzing nucleophilic attack of the primer 3′-OH on incoming dNTPs, guided by Watson-Crick base pairing. In bacteria this elongation is performed mainly by DNA polymerase III, while DNA polymerase I later removes RNA primers and fills gaps. The antiparallel template strands therefore produce continuous leading-strand synthesis and discontinuous lagging-strand synthesis. Overall fidelity arises from base-selective polymerization, polymerase 3′→5′ exonuclease proofreading, and post-replicative mismatch repair, enabling complete genome duplication within the observed cell-cycle timing.

Transcription Initiation and Regulation

Alexander S. Serov, Alexander J. Levine, Madhav Mani / arXiv 1701.06079v1

Transcription initiation requires RNA polymerase to bind promoter DNA after transcription factors recognize specific sequences and assemble complexes that recruit and position the enzyme while modulating every stage from open complex formation through promoter escape elongation and pausing. In bacteria the core enzyme associates with sigma factors to select conserved minus thirty five and minus ten elements before melting DNA into the open state that exposes the template for initial nucleotide addition. Eukaryotic control incorporates ATP dependent chromatin remodeling that generates non equilibrium cooperativity among transcription factors by preventing trapping and increasing promoter responsiveness to combinations of regulators. Simplified initiation models that include polymerase pausing produce population level behaviors such as rapid synchronized responses to environmental shifts or persistent memory of prior transcriptional states depending on the chosen control logic. Live imaging of the early Drosophila embryo demonstrates that maximal transcription rates for Hunchback Snail and Knirps are identical across nuclear cycles yet reach only forty percent of the value expected from bulk polymerase traffic jam models implying that slower elongation near the promoter shifts the bottleneck from downstream DNA to the initiation region itself. Geometrical analysis further shows that B DNA must maintain a pitch angle below the zero twist value of forty one point eight degrees with empirical measurements confirming approximately thirty eight degrees to allow transcription without prohibitive rotational strain.

RNA Processing and Splicing

Web research on pre-mRNA processing (Perplexity)

Pre-mRNA processing relies on three tightly coupled systems that operate alongside RNA polymerase II transcription: the spliceosome for intron removal, the 5′ capping machinery that modifies the nascent transcript, and the cleavage/polyadenylation machinery that forms the 3′ end. Splicing proceeds through two transesterification reactions that generate a lariat intermediate before exon ligation, with the chemistry driven by divalent metal ions coordinated by U6 snRNA, including at least two Mg²⁺ ions or a four-metal-ion cluster containing catalytic Mg²⁺ and structural K⁺. Assembly begins when U1 snRNP recognizes the 5′ splice site and U2 snRNP binds the branchpoint, after which the U4/U6·U5 tri-snRNP joins; multiple RNA-dependent ATPases and helicases then remodel the complexes. Accuracy and alternative splicing are governed by cis-acting elements at splice sites and branchpoints together with trans-acting proteins such as SR proteins and hnRNPs, while chromatin state and polymerase elongation rate further modulate exon choice. Splicing itself can occur co-transcriptionally through direct physical and kinetic links to the elongating polymerase. Capping occurs during transcription and is coupled to elongation and reinitiation, whereas 3′-end cleavage and poly(A) addition are functionally connected to both transcription termination and splicing.

Translation and Protein Synthesis

Olivier Dauloudet et al. / arXiv 2009.14533v2

Mathematical models demonstrate how ribosome dynamics govern protein synthesis rates during mRNA translation. The Ribosome Transport model with Diffusion couples a Totally Asymmetric Simple Exclusion Process to a finite diffusive reservoir and yields an analytical expression for synthesis rate as a function of the ribosome diffusion constant, confirmed by continuous-time Monte Carlo simulations; under biologically relevant parameters this shows cytoplasmic diffusion is rapid enough that it exerts no control over initiation, as derived in Dauloudet et al. (arXiv 2009.14533v2). Separate analysis of ribosome flow on linear and circular mRNAs proves that steady-state production is maximized at a ribosomal density equal to half the maximum packing density, a result obtained through exact mathematical treatment by Zarai et al. (arXiv 1607.04064v1). Transcript length further modulates efficiency because proximity of the 3′ end to the recruitment site creates a recycling feedback that elevates ribosome loading, an outcome predicted by kinetic models that incorporate diffusion, circularization, and drop-off and shown to reproduce observed ribosome-density versus gene-length trends (Fernandes et al., arXiv 1702.00632v3). A unified framework that merges mutual exclusion with explicit chemo-mechanical steps inside each ribosome additionally forecasts both synthesis rates and spatial density profiles on E. coli transcripts, generating experimentally testable changes when individual rate constants are varied (Basu and Chowdhury, arXiv physics/0608098v3). These frameworks together indicate that cells can tune translation through diffusion-independent density optimization and length-dependent recycling.

Prokaryotic Gene Regulation

Dagmar Iber / A quantitative study of the benefits of co-regulation using the spoIIA operon as an example / q-bio/0607008v1

Bacteria organize functionally related genes into operons transcribed as polycistronic mRNA from one promoter, enabling coordinated expression. Negative control occurs when repressors bind operators to block transcription of structural genes, whereas positive control involves activators that enhance RNA polymerase recruitment to promoters. Inducible systems remain off until an inducer molecule switches them on, while repressible systems stay active until a corepressor or metabolic end product turns them off. Attenuation provides an additional layer, as seen when metabolite levels dictate early termination in the trp operon. In the spoIIA operon of Bacillus subtilis, operon organization combined with translational coupling counters inherent stochastic noise in gene expression that would otherwise skew essential protein ratios during sporulation; quantitative modeling of the sigmaF network shows this arrangement measurably improves fitness and survival. Horizontal gene transfer supplies a substantial fraction of operon genes, accounting for at least 5.5 percent of the genome in Escherichia, Shigella, and Salmonella, with roughly 46 percent of those transfers originating from other gamma-proteobacteria; a cluster-based method minimizing insertion-deletion events reliably identifies these acquisitions and links them to operon assembly. The lac operon illustrates combined regulation, where LacI blocks transcription without lactose and CAP-cAMP activates it under low-glucose conditions.

Eukaryotic Gene Regulation

Alejandro Saettone et al., arXiv:1803.08575v1

Eukaryotic gene regulation coordinates promoters that recruit RNA polymerase II and basal machinery near transcription start sites marked by H3K4me3, enhancers that stimulate initiation and elongation from variable distances via chromatin looping, and chromatin states that govern accessibility through open configurations, DNase hypersensitivity, and marks such as H3K27ac. These elements interact in cell-type-specific patterns driven by transcription factor combinations, with the sequence proceeding from chromatin opening to factor binding, looping, co-activator recruitment, and productive elongation. ATP-dependent remodelers enable non-equilibrium mechanisms, as affinity purification-mass spectrometry in Tetrahymena thermophila identifies an 11-subunit SWI/SNF complex localizing exclusively to the active macronucleus, with the bromodomain protein Ibd1 associating during growth to connect SWI/SNF to additional complexes including a putative H3K4 methyltransferase. Stochastic models of nonequilibrium fluctuations reveal that slow chromatin transitions in strongly nonadiabatic regimes maximize entropy production through circular probability currents, while milder regimes produce hysteresis in which chromatin changes precede transcriptional shifts, as illustrated in circuits of three core mouse embryonic stem cell genes. ATP dependence further permits hierarchical cooperativity among noninteracting factors by preventing equilibrium trapping at the mouse mammary tumor virus promoter, reconciling conflicting binding data and providing combinatorial control of promoter responsiveness.

Post-Translational Modifications

Oufan Zhang, Shubhankar A. Naik, Zi Hao Liu, Julie Forman-Kay, Teresa Head-Gordon / A Curated Rotamer Library for Common Post-Translational Modifications of Proteins / arXiv:2405.03120v1

Post-translational modifications expand protein functional diversity through targeted chemical alterations on side chains. Phosphorylation adds phosphate groups primarily to serine, threonine, and tyrosine via kinases, reversibly toggling enzyme activity and regulating cell-cycle progression, apoptosis, and signaling cascades that touch roughly thirty percent of the human proteome. Glycosylation attaches glycans to asparagine or serine/threonine residues, shaping folding, stability, trafficking, and cell-matrix interactions especially for secreted and membrane proteins. Ubiquitination conjugates ubiquitin to lysines, directing proteasomal degradation while also modulating DNA repair, endocytosis, and trafficking. Acetylation and other acylations target lysines to alter chromatin structure, gene expression, protein interactions, and metabolic regulation. Methylation places methyl groups on lysines or arginines to control transcription and signaling assemblies. Lipidation anchors proteins to membranes, dictating subcellular localization and spatial organization of pathways. SUMOylation attaches SUMO to lysines for similar regulatory roles. A curated rotamer library derived from RCSB PDB entries for phosphorylated, methylated, and acetylated residues improves side-chain prediction accuracy over SIDEpro and Rosetta; these libraries integrate with Monte Carlo Side Chain Entropy sampling to generate ensembles for folded proteins and combine with Local Disordered Region Sampling inside IDPConformerGenerator for intrinsically disordered regions.

Cellular Signaling Pathways

A Note on Elementary Cellular Automata Classification — arXiv 1306.5577v2

Molecular cascades transmit signals from membrane receptors to the nucleus by converting an initial ligand-induced receptor conformational change into a series of enzyme activations, second-messenger pulses, and protein phosphorylations that ultimately modify transcription factors and chromatin, thereby altering gene expression. These cascades both carry the signal inward and amplify it so that one receptor-ligand event can produce a large coordinated nuclear response. Ligand binding to a cell-surface transmembrane receptor induces a conformational change transmitted across the membrane that activates the receptor often by conferring kinase activity or recruiting cytoplasmic partners such as G-proteins or JAK kinases. Receptor classes able to initiate such cascades include G-protein-coupled receptors that stimulate adenylyl cyclase or phospholipase C, receptor tyrosine kinases that autophosphorylate and trigger MAPK cascades, cytokine receptors that activate STAT transcription factors via JAKs, and receptors such as Notch or those in the TGF-β and TNF families that employ proteolysis or dedicated mediators including SMADs and NF-κB. Activated receptors generate diffusible second messengers including cAMP that stimulates protein kinase A, calcium ions released by IP3 that engage calmodulin-dependent enzymes, and diacylglycerol that activates protein kinase C. These messengers increase in number relative to the original ligand and thereby amplify the signal. Parallel phosphorylation cascades proceed when each activated kinase phosphorylates multiple downstream kinases or substrates propagating and distributing the information until nuclear targets are reached.

Your AI shouldn’t start from zero.

Install this pack and your MIND begins smart — then every answer is grounded in your own knowledge graph.

Try MIND free →