Precision Medicine Informatics
1. ClinGen: BI faculty are leading and organizing community data standards for curation of genetic variations within the NIH-funded ClinGen program, to augment precision medicine research. Informatics research has included:
- Novel standard called Minimal Variant Level Data (MVLD) to guide reporting of clinical lab test results for molecular diagnostics;
- Natural Language Processing tools to automate the extraction of variation, treatments and outcome relationships for diseases from the biomedical literature;
- a computational approach for selection of therapies targeting drug resistant variation that won the Marco Ramoni award at AMIA
- a workflow for molecular simulation and functional analysis of protein variants; and,
- a Network Approach to Recommending Targeted Cancer Therapies.
2. MACE2K: the Molecular And Clinical Extraction: A Natural Language Processing Tool for Personalized Medicine. As part of NIH’s BD2K (“Big Data to Knowledge”) program, we received a U01 grant for the development of “MACE2K” – Molecular and Clinical Extraction to Knowledge for Precision Medicine. MACE2K is a software tool to automatically extract information from unstructured data (literature, clinical charts etc.) to help biocurators, clinicians and clinical researchers to assess the overall evidence associated with diseases and therapies.
3. CPTAC: The Clinical Proteomic Tumor Analysis Consortium (CPTAC) of NCI’s Office of Cancer Clinical Proteomics Research is a comprehensive and coordinated effort to accelerate the understanding of the molecular basis of cancer through the application of proteomic technologies and workflows to tumor samples with characterized genomic and transcript profiles. The combination of proteomics, transcriptomics, and genomics data from the same tumor samples provides an unprecedented opportunity for tumor proteogenomics as illustrated by several consortium wide high-profile Nature and Cell publications.
Read More
Advanced Bioinformatics Tools and Research Platforms:
BI develops innovative scientific software to enable translational research. Our projects include muti-omics data analysis, vaccine safety research, clinical data analysis, high definition data visualization, natural language processing, and mobile application development. Some of our open science projects include:
G-DOC: Georgetown Database of Cancer (G-DOC) is a precision medicine platform that enables the integrative analysis of multiple data types to understand disease mechanisms, biomarker discovery, data management and education.
Virtual Research Environment (VRE): VRE is a secure cloud platform for research and education. BI took on the task of creating a virtual research environment (VRE) in the cloud leveraging the Google Cloud Platform (GCP) for provisioning computing resources, securely storing and sharing data. VRE was designed and developed to overcome barriers met by the research community while complying with institutions’ policies and current state and federal policies and regulations. VRE is being developed to become GHUCCTS’ preferred and recommended secure cloud service for research, education and data sharing needs. When used appropriately, VRE can also be used to store, manage, and analyze electronic protected health information (ePHI).
VRE also enables our programs and investigators to participate in and share de-identified / standardized data with the community and research networks. VRE is a multi-mission platform that can facilitate the advancement of science, education, and services.
Clinical Study Data Collection and Surveys: REDCap (Research Electronic Data Capture) is a research tool developed at Vanderbilt University as a secure web application to allow users to build and manage online surveys and databases, and to support data capture for research studies.
Read More
Health Data Science
BI faculty and staff conduct research using public and proprietary datasets to advance Health Data Science and Precision Medicine with the goal of deriving actionable knowledge from genomics, electronic health records, registries, patient-reported, public health and other datasets some of which include:
- Immuno Oncology Registry: a centralized research data warehouse for ImmunoOncology that is enabling novel hypothesis generation and retrospective outcomes research at the 10 DC-Baltimore based MedStar Health network hospitals.
- Pediatric cancer outcomes registry: A database of pediatric cancer patients that were diagnosed with various cancers at Lombardi Cancer Center’s Pediatric Oncology Program and were enrolled or treated as per Children’s Oncology Group (COG) protocols between 1990 and 2014.
- Rembrandt brain tumor registry: REMBRANDT includes genomic data from 261 samples of glioblastoma, 170 of astrocytoma, 86 tissues of oligodendroglioma, and a number that are mixed or of an unknown subclass. Outcomes data include more than 13,000 data points.
- VA suicide ideation project: Using novel approaches to extract acoustic and semantic features from audio interviews to predict suicidal tendencies in military veterans. A classifier was built that differentiates suicidal from non-suicidal veterans based on acoustic features of speech and sentiment analysis of transcribed narratives.
Read More
Data Science Challenges
Some of our active and recent challenges:
- PrecisionFDA Challenge: PrecisionFDA and BI launched the Brain Cancer Predictive Modeling and Biomarker Discovery Challenge-This challenge asks participants to develop machine learning and/or artificial intelligence models to identify biomarkers and predict patient outcomes using gene expression, DNA copy number, and clinical data.
- COVID-19 Data Visualization Challenge: This initiative strives to generate insights that could lead to a better understanding of impact of physical distancing and the outbreak. We aim to bring amazing talent to work on the data and generate insights that can benefit the global community’s work to understand and control the spread of this pandemic.
Educational Activities
Since 2012 ICBI team members in GHUCCTS BI have organized the annual Health Informatics & Data Science Symposium at Georgetown University. This free one-day event is attended by over 350 researchers, students and other professionals and showcases advances in the areas of molecular medicine, health data analytics and related state-of-the-art technologies.
In 2017 we developed the MOOC Demystifying Biomedical Big Data: A User's Guide that has been offered online via EdX and been taken by over 8,000 students and faculty worldwide.
In 2019, we launched a Master of Science program in Health Informatics & Data Science (HIDS). HIDS is an accelerated, career-ready program, focused on current and emerging technologies. Students will gain competency in health data science, big data analytics, artificial intelligence and machine learning applications. The curriculum aligns with the core competencies in medical informatics defined by of the American Medical Informatics Association. HIDS is an industry-driven program, focused on current and emerging technologies that will inform healthcare and is well poised to create a pipeline of top talent of students and trainees for GHUCCTS research as well as educate the next generation of leaders in informatics to transformation healthcare.
Access to Research Datasets and Data Governance
BI has done important work on the management and distribution of biomedical data in support of research projects. Many of these informatics projects that started at GHUCCTS BI have developed independent funding from the NIH and other HHS agencies.
Several datasets are currently hosted, curated and managed by BI and available to the GHUCCTS community and investigators for research use. | View Datasets
BI data managers also act in the role of an honest brokers to provide electronic medical records, registry and other patient data collected during clinical care and operations for research purposes this includes access to:
- PHI for research purposes
- Limited Data Sets
- Cohort Discovery (aggregate number) or De-Identified data.
Data Access Policies
Select Peer-Reviewed Publications
- Gusev Y; Bhuvaneshwar K, Song L, Zenklusen JC, Fine H, Madhavan S. “The REMBRANDT study, a large collection of genomic data from brain cancer patients.” (2018) Nature Scientific Data 5:180158. PMID 30106394
- Pishvaian MJ, Blais EM, Brody JR, Lyons E, DeArbeloa P, Hendifar A, Mikhail S, Chung V, Sahai V, Sohal DPS, Bellakbira S, Thach D, Rahib L, Madhavan S, Matrisian LM, Petricoin EF 3rd. “Overall survival in patients with pancreatic cancer receiving matched therapies following molecular profiling: a retrospective analysis of the Know Your Tumor registry trial.” (2020) Lancet Oncol ;21(4):508-518. PMID 32135080
- Wagner AH, Walsh B, Mayfield G, Tamborero D, Sonkin D, Krysiak K, Deu-Pons J, Duren RP, Gao J, McMurry J, Patterson S, Del Vecchio Fitz C, Pitel BA, Sezerman OU, Ellrott K, Warner JL, Rieke DT, Aittokallio T, Cerami E, Ritter DI, Schriml LM, Freimuth RR, Haendel M, Raca G, Madhavan S, Baudis M, Beckmann JS, Dienstmann R, Chakravarty D, Li XS, Mockus S, Elemento O, Schultz N, Lopez-Bigas N, Lawler M, Goecks J, Griffith M, Griffith OL, Margolin AA. “A harmonized meta-knowledgebase of clinical interpretations of somatic genomic variants in cancer.” (2020) Nat Genetics 52(4):448-457 PMID 32246132
- Danos AM, Krysiak K, Barnell EK, Coffman AC, McMichael JF, Kiwala S, Spies NC, Sheta LM, Pema SP, Kujan L, Clark KA, Wollam AZ, Rao S, Ritter DI, Sonkin D, Raca G, Lin WH, Grisdale CJ, Kim RH, Wagner AH, Madhavan S, Griffith M, Griffith OL. “Standard operating procedure for curation and clinical interpretation of variants in cancer”. (2019) Genome Med. 11(1):76. PMID 31779674
- McCoy MD, Shivakumar V, Nimmagadda S, Jafri MS, Madhavan S. “SNP2SIM: a modular workflow for standardizing molecular simulation and functional analysis of protein variants.” (2019) BMC Bioinformatics. 20(1):171. PMID 30943891
- Kancherla J, Rao S, Bhuvaneshwar K, Riggins RB, Beckman RA, Madhavan S, Corrada Bravo H, Boca SM. “Evidence-Based Network Approach to Recommending Targeted Cancer Therapies.” (2020) JCO Clin Cancer Inform. 4:71-88. PMID 31990579