Friday, March 29, 2019
The Central Dogma of Molecular Biology
The aboriginal principle of Molecular BiologyThe mite we know forthwith as deoxyribonucleic acidic was first observed in 1869 by Swiss biologist Friedrich Miescher, who stumbled upon a pump which was resistant to protein digestion. At the time he referred to the shred as nuclein (Pray, 2008). Though Miescher remained in obscurity, Russian biochemist Phoebus Levene continued earn with this substance and in 1919 discovered the three study components of a nucleotide phosphate, sugar, and stern. He noted that the sugar component was ribose for ribonucleic acid and deoxyribose for desoxyribonucleic acid, and he proposed that nucleotides were do up of a chain of nucleic acids (Levene, 1919). He was largely correct, and in 1950 Erwin Chargaff, after denotation a paper by Oswald Avery in which Avery identified the gene as the unit of hereditary material (Avery, 1944), set out to discover whether the deoxyribonucleic acid soupcon differed among species. He found that although, i n contrast to Levenes proposal that nucleotides be always repeated in the same parade, nucleotides appear in polar orders in divers(prenominal) beings, these molecules maintained certain characteristics. This led him to develop a set of rules (known as Chargaffs Rules) in which he states that the fit count of purines (Adenine and Guanine) and the total number of pyrimidines (Cytosine and Thymine) atomic number 18 almost always equal in an organisms contagious material. In 1952 Rosalind Franklin and Maurice Wilkins apply X-ray cryst onlyography to capture the first image of the molecules shape, and in 1953 James Watson and Francis Crick fin completelyy proposed the three dimensional model for desoxyribonucleic acid (Watson, 1953). The four main tenants of their discovery still hold true today 1) desoxyribonucleic acid is a double-stranded helix, 2) the majority of these helices are right-handed, 3) the helices are anti-parallel, and 4) the desoxyribonucleic acid base pairs within the helix are joined by hydrogen bonding, and the bases stool hydrogen bond with other molecules such as proteins.The Central Dogma of Molecular Biology, first proposed by Francis Crick (Crick, 1958), describes the directional processes of conversion from deoxyribonucleic acid to RNA and from RNA to protein. This gene expression process starts with DNA, a double-stranded molecule consisting of base-paired nucleic acids adenine (A), cytosine (C), guanine (G), and thymine (T) on a sugar-phosphate backbone. This catching material serves as the information storagefor life, a dictionary of sorts that provides all of the necessary tools for an organism to create the components of itself. During the process of recording, the DNA molecule is usanced to make messenger RNA ( informational RNA), which carries a special(prenominal) instanceof the DNA instructions to the motorcarry that give make protein. Proteins are synthesized during translationusing the informational RNA mol ecule as a guide. Gene expression is a deterministic process during which each molecule is manufactured using the product of the introductory step. The end result is a conversion from the genetic code into a utilitarian unit which prat be used to perform the work of the cell. As you can imagine, this process must be controlled by an organism in order to make efficient use of resources, respond to environmental tilts, and differentiate cells within the body. Gene ordinance, as it is fewtimes called, occurs at all stages along the way from DNA to protein.Regulation falls into four categories 1) epigenetic (methylation of DNA or protein, acetylation), 2) transcriptional (involves proteins called transcription factors), 3) post-transcriptional (sequestration of RNA, alternative splicing of mRNA, microRNA (miRNA) and small interfering RNA (siRNA)), and 4) post-translational modification (phosphorylation, acetylation, methylation, ubiquitination, etc. of protein products). Epigeneti c regulation of DNA involves a reversible, heritable change that does not alter the term itself. DNA methylation occurs on the nucleic acid cytosine. Arginine and lysine are the most commonly methylated amino acids. When proteins called histones) contain certain methylated residues, these proteins can repress or activate gene expression. Often this occurs on the transcriptional level, and thus prevents the cell from manufacturing messenger RNA (mRNA), the precursor to proteins. Proteins are much referred to as the workhorse of the cell and are responsible for everything from catalyzing chemical reactions to providing the building blocks for nose outless muscles. Some proteins, called transcription factors), help to up- or down-regulate gene expression levels. These proteins can act alone or in conjunction with other transcription factors and bind to DNA bases near gene coding regions.This is a ordinary schema for gene expression. DNA is a double-stranded molecule consisting of base-paired nucleic acids A, C, G, and T on a sugar-phosphate backbone and is used as information storage. mRNA is made during transcription and carries a specific instance of the DNA instructions to the machinery that go awaying make the protein. Proteins are synthesized during translation using the information in mRNA as a template. This is a deterministic process during which each molecule is manufactured using the product of the previous step. mRNA requires a 5 cap and a 3 poly(A) tail in order to be exported out of the nucleus. The cap is critical for recognition by the ribosome and protection from enzymes called RNases that go away break down the molecule. The poly(A) tail and the protein bound to it aid in defend mRNA from degradation by other enzymes called exonucleases.What can be gained by studying gene regulation? In general, it allows us to understand how an organism evolves and develops, both on a local scale (Choe, 2006,Wilson, 2008), and on a more global mesh top ology level. There are, however, more specific reasons to investigate this process more closely. Failure in gene regulation has been maneuvern to be a key factor in affection (Stranger, 2007). Additionally, tuition how to dampen gene regulation may lead to the development of drugs to fight bacterium and viruses (McCauley, 2008). A clearer sense of this process in microorganisms may lead to affirmable solutions to the problem of antimicrobial resistance (Courvalin, 2005).There are two major factors that motivate the studies herein. Firstly, the size and quality of biological data sets has increased dramatically in the last several years. This is due to high-throughput experimental techniques and technology, both of which move over provided large amounts of fundamental interaction data, along with X-ray crystallography and nuclear magnetic tintinnabulation (NMR) experiments which have given us the solved three-dimensional structure of proteins. Secondly, machine learning ha s become an increasingly popular tool in bioinformatics research because it allows for more sound gene and protein annotation without relying solely on sequence similarity. If a collection of attributes which distinguish amongst two classes of proteins can be assembled, function can be predicted.In this work we focus in the first place on regulation at the transcriptional level and the components which play a tyrannical role in this operation. So-called nucleic acid-binding (NA-binding) proteins, which allows transcription factors, are involved in this and many other cellular processes. Disruption or malfunction of transcriptional regulation may result in disease. We identify these proteins from representative data sets which include many categories of proteins. Additionally, in order to understand the underlying mechanisms, we predict the specific residues involved in nucleic acid binding using machine learning algorithms. Identification of these residues can provide practical assistance in the functional annotation of NA-binding proteins. These predictions can overly be used to expedite mutagenesis experiments, manoeuvre researchers to the correct binding residues in these proteins.Toward the ultimate goal of attaining a deeper understanding of how nucleic acid-binding proteins facilitate the regulation of gene expression within the cell, the research set forth here focuses on three particular aspects of this problem. We begin by examining the nucleic acid-binding proteins themselves, both on the protein and residue levels. Next, we turn our attention toward protein binding sites on DNA molecules and a particular type of modification of DNA that can usurp protein binding. We then parcel out a global perspective and study homophile molecular meshings in the context of disease, focusing on regulatory and protein-protein interaction internets. We examine the number of partnership interactions between transcription factors and how it scales with th e number of keister genes regulated. In several model organisms, we stripping that the distribution of the number of partners vs. the number of target genes appears to follow an exponential saturation curve. We also find that our productive transcriptional net model follows a similar distribution in this comparison. We show that cancer- and other disease-related genes preferentially occupy particular positions in conserved motifs and find that more ubiquitously expressed disease genes have more disease associations. We also predict disease genes in the protein-protein interaction network with 79% field of honor under the ROC curve (AUC) using ADTree, which identifies important attributes for prediction such as degree and disease neighbor ratio. Finally, we create a co-occurrence matrix for 1854 diseases ground on shared gene uniqueness and find both previously known and potentially undiscovered disease relationships.The goal for this fox is to predict nucleic acid-binding on both the protein and residue levels using machine learning. two sequence- and structure-based features are used to distinguish nucleic acid-binding proteins from non-binding proteins, and nucleic acid-binding residues from non-binding residues. A novel application of a costing algorithm is used for residue-level binding prediction in order to achieve high, balanced accuracy when working with imbalanced data sets.During the away few decades, the amount of biological data available for analysis has heavy(p) exponentially. Along with this vast amount of information comes the challenge to make sense of it all. One subject of immediate concern to us as human is health and disease. Why do we get sick, and how? Where do our bodies fail on a molecular level in order for this to happen? How are diseases related to each other, and do they have similar modes of action? These questions will require many researchers from multiple disciplines to answer, but where do we start? We take a bioin formatics approach and examine disease genes in a network context. In this chapter we analyze human disease and its relationship to two molecular networks. First, we find conserved motifs in the human transcription factor network and identify the location of disease- and cancer-related genes within these structures. We find that both cancer and disease genes occupy certain positions more frequently. Next, we examine the human protein-protein interaction (PPI) network as it relates to disease. We find that we are able to predict disease genes with 79% AUC using ADTree with 10 topological features. Additionally, we find that a combination of several network characteristics including degree centrality and disease neighbor ratio help distinguish between these two classes. Furthermore, an alternating decision tree (ADTree) classifier allows us to see which combinations of powerfully predictive attributes contribute most to protein-disease classification. Finally, we build a matrix of dis eases based on shared genes. Instead of using the raw count of genes, we use a uniqueness) score for each disease gene that relates to the number of diseases with which a gene is involved. We show several interesting examples of disease relationships for which there is some clinical evidence and some for which the information is lacking. We believe this matrix will be useful in finding relationships between diseases with very different phenotypes, or for those disease connections which may not be obvious. It could also be helpful in identifying new potential drug targets through drug repositioning.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment