GHUCCTS Programs & Resources

Share This

Biomedical Informatics

Learn more about Biomedical Informatics (BI).

  • What's It All About

    The GHUCCTS Biomedical Informatics (BI) core supports the development of impactful tools, datasets, and methods to drive clinical and translational research across GHUCCTS institutions. The core has established multiple informatics education and training programs to train the next generation of informatics leaders, including an annual Health Informatics and Data Science Symposium, massive open online courses, and a Master of Science program in Health Informatics and Data Science at Georgetown University.

    BI includes faculty and staff members from the Innovation Center for Biomedical Informatics (ICBI) and Georgetown’s Office of Chief Data Scientist, has developed into a hub for research in biomedical informatics, educates the next generation of informaticians and computational data scientists. 2019 marked the eighth instance of our annual symposium, which attracts participation from over 300 people in academia, health systems, health IT organizations and government agencies.

    Recently GUMC established the Office of Chief Data Scientist and nominated GHUCCTS BI’s director Dr. Subha Madhavan as Chief Data Scientist. which oversees all research-related data initiatives as well as a portfolio to develop informatics and health data science research and educational programs at Georgetown University. This office pursues collaborative opportunities with clinical partners (MedStar Health in DC and MD and Hackensack University hospital system in NJ) and industry partners on emerging topics in health data science such as artificial intelligence, machine learning and leveraging big data to advance health care and public health.

    Strategic goals for the BI component are to bring innovation to GHUCCTS and the national CTSA program in the following ways:

    • Drive personalized medicine research within the DC area based underserved population through innovative academic/private partnerships and unprecedented access to protected health data including molecular profiling and clinical outcomes.
    • Helping to establish an integrated and collaborative national network of de-identified genetic, imaging, clinical, economic, environmental, behavioral, and patient reported data from our affiliated institutions, which can be accessed by translational researchers.
    • Organize GHUCCTS datasets into industry standards and expose a standardized API for participation in CTSA and other national networks.
    • Build powerful clinical-omics databases and national standards such as the MVLD standard to drive regulatory science using enterprise class software development tools and best practices led by field experts.
    • Provide tools through the Georgetown Database of Cancer Plus (G-DOC) to curate, organize, rank, and validate evidence for disease-biomarker associations from literature, clinical trial repositories, and other Big Data sources.
    • Work with professional societies and standards organizations to develop recommendations for precision medicine and disseminate consensus views.


    The BI component of GHUCCTS is comprised of multiple centers and collaborators including:

    The core organizes and sponsors regional annual biomedical informatics symposiums:

    1. Biomedical Informatics Symposium at Georgetown University
    2. Computational Biology and Informatics Symposium at Howard University
    3. MedStar Health Annual Research Symposium

    Summary of BI's top accomplishments from the past 10 years

  • Who is Responsible

    Headshot of Dr. Peter McGarveyDirector: Peter McGarvey, PhD

    Dr. Peter McGarvey is Director of ICBI and Research Professor in the Department of Biochemistry and  Molecular & Cellular Biology, Georgetown University Medical Center (GUMC). He has academic and commercial experience in bioinformatics, software development, biotechnology and molecular biology. His research interests include genomic and proteomic analysis, biological databases, and data visualization. Currently, Dr. McGarvey helps manages several projects at ICBI and GUMC including the Clinical Proteomics Tumor Analysis Consortium (CPTAC) Data Center and Assay Portal, the Protein Information Resource, the UniProt Knowledgebase and Molecular and Clinical Extraction 2 Knowledge (MACE2K). Dr. McGarvey has a PhD in Biological Sciences from the University of Michigan and MS in Technology Management from University of Maryland University College.

    Headshot of GHUCCTS member Adil AlaouiCo-Director of the IT Infrastructure Task : Adil Alaoui, MS, MBA

    Adil Alaoui provides leadership, integrated management, and strategic direction to align innovative healthcare requirements with technical solutions. Mr. Alaoui is responsible for the design and implementation of Innovative Health Information Exchange approaches to integrate disparate systems and maintain heterogeneous data in secure research data repositories. He also directs the development of tools and complex technology solutions in support of GUMC’s systems medicine vision and leads the development and implementation of efficient and effective Health Information Technology Solutions. He is the lead IT architect for GHUCCTS. Prior to joining ICBI, Mr. Alaoui led the development and successful implementation of several global Telehealth and Teleradiology projects sponsored by the National Library of Medicine (NLM), DoD and Department of State. Mr. Alaoui is an adjunct lecturer at the Nursing and Health Sciences School at Georgetown University.

    Additional Team Members
    • Anas Belouali (Clinical Informatics)
    • Kanchi Krishnamurthy (Clinical Informatics)
    • Stephen Fernandez (Clinical Informatics)
    • Yuriy Gusev, PhD (Senior Bioinformatician, Education)
    • Krithika Bhuvaneshwar, MS (Research Associate, Data Analyst) 
  • Tell Me More

    Precision Medicine Informatics

    1. ClinGen: BI faculty are leading and organizing community data standards for curation of genetic variations within the NIH-funded ClinGen program, to augment precision medicine research. Informatics research has included:

    • Novel standard called Minimal Variant Level Data (MVLD) to guide reporting of clinical lab test results for molecular diagnostics;
    • Natural Language Processing tools to automate the extraction of variation, treatments and outcome relationships for diseases from the biomedical literature;
    • a computational approach for selection of therapies targeting drug resistant variation that won the Marco Ramoni award at AMIA
    • a workflow for molecular simulation and functional analysis of protein variants; and,
    • a Network Approach to Recommending Targeted Cancer Therapies.

    2. MACE2K: the Molecular And Clinical Extraction: A Natural Language Processing Tool for Personalized Medicine. As part of NIH’s BD2K (“Big Data to Knowledge”) program, we received a U01 grant for the development of “MACE2K” – Molecular and Clinical Extraction to Knowledge for Precision Medicine. MACE2K is a software tool to automatically extract information from unstructured data (literature, clinical charts etc.) to help biocurators, clinicians and clinical researchers to assess the overall evidence associated with diseases and therapies.

    3. CPTAC: The Clinical Proteomic Tumor Analysis Consortium (CPTAC) of NCI’s Office of Cancer Clinical Proteomics Research is a comprehensive and coordinated effort to accelerate the understanding of the molecular basis of cancer through the application of proteomic technologies and workflows to tumor samples with characterized genomic and transcript profiles. The combination of proteomics, transcriptomics, and genomics data from the same tumor samples provides an unprecedented opportunity for tumor proteogenomics as illustrated by several consortium wide high-profile Nature and Cell publications.

    Read More

    Advanced Bioinformatics Tools and Research Platforms:

    BI develops innovative scientific software to enable translational research. Our projects include muti-omics data analysis, vaccine safety research, clinical data analysis, high definition data visualization, natural language processing, and mobile application development. Some of our open science projects include:

    G-DOC: Georgetown Database of Cancer (G-DOC) is a precision medicine platform that enables the integrative analysis of multiple data types to understand disease mechanisms, biomarker discovery, data management and education.

    Virtual Research Environment (VRE): VRE is a secure cloud platform for research and education. BI took on the task of creating a virtual research environment (VRE) in the cloud leveraging the Google Cloud Platform (GCP) for provisioning computing resources, securely storing and sharing data. VRE was designed and developed to overcome barriers met by the research community while complying with institutions’ policies and current state and federal policies and regulations. VRE is being developed to become GHUCCTS’ preferred and recommended secure cloud service for research, education and data sharing needs. When used appropriately, VRE can also be used to store, manage, and analyze electronic protected health information (ePHI). 

    VRE also enables our programs and investigators to participate in and share de-identified / standardized data with the community and research networks. VRE is a multi-mission platform that can facilitate the advancement of science, education, and services.

    Clinical Study Data Collection and Surveys: REDCap (Research Electronic Data Capture) is a research tool developed at Vanderbilt University as a secure web application to allow users to build and manage online surveys and databases, and to support data capture for research studies. 

    Read More

    Health Data Science

    BI faculty and staff conduct research using public and proprietary datasets to advance Health Data Science and Precision Medicine with the goal of deriving actionable knowledge from genomics, electronic health records, registries, patient-reported, public health and other datasets some of which include:

    • Immuno Oncology Registry: a centralized research data warehouse for ImmunoOncology that is enabling novel hypothesis generation and retrospective outcomes research at the 10 DC-Baltimore based MedStar Health network hospitals.
    • Pediatric cancer outcomes registry: A database of pediatric cancer patients that were diagnosed with various cancers at Lombardi Cancer Center’s Pediatric Oncology Program and were enrolled or treated as per Children’s Oncology Group (COG) protocols between 1990 and 2014.
    • Rembrandt brain tumor registry: REMBRANDT includes genomic data from 261 samples of glioblastoma, 170 of astrocytoma, 86 tissues of oligodendroglioma, and a number that are mixed or of an unknown subclass. Outcomes data include more than 13,000 data points.
    • VA suicide ideation project: Using novel approaches to extract acoustic and semantic features from audio interviews to predict suicidal tendencies in military veterans. A classifier was built that differentiates suicidal from non-suicidal veterans based on acoustic features of speech and sentiment analysis of transcribed narratives.

    Read More

    Data Science Challenges

    Some of our active and recent challenges:

    1. PrecisionFDA Challenge: PrecisionFDA and BI launched the Brain Cancer Predictive Modeling and Biomarker Discovery Challenge-This challenge asks participants to develop machine learning and/or artificial intelligence models to identify biomarkers and predict patient outcomes using gene expression, DNA copy number, and clinical data.

    2. COVID-19 Data Visualization Challenge: This initiative strives to generate insights that could lead to a better understanding of impact of physical distancing and the outbreak. We aim to bring amazing talent to work on the data and generate insights that can benefit the global community’s work to understand and control the spread of this pandemic.

    Educational Activities

    Since 2012 ICBI team members in GHUCCTS BI have organized the annual Health Informatics & Data Science Symposium at Georgetown University. This free one-day event is attended by over 350 researchers, students and other professionals and showcases advances in the areas of molecular medicine, health data analytics and related state-of-the-art technologies.

    In 2017 we developed the MOOC Demystifying Biomedical Big Data: A User's Guide that has been offered online via EdX and been taken by over 8,000 students and faculty worldwide.

    In 2019, we launched a Master of Science program in Health Informatics & Data Science (HIDS). HIDS is an accelerated, career-ready program, focused on current and emerging technologies. Students will gain competency in health data science, big data analytics, artificial intelligence and machine learning applications. The curriculum aligns with the core competencies in medical informatics defined by of the American Medical Informatics Association. HIDS is an industry-driven program, focused on current and emerging technologies that will inform healthcare and is well poised to create a pipeline of top talent of students and trainees for GHUCCTS research as well as educate the next generation of leaders in informatics to transformation healthcare.

    Access to Research Datasets and Data Governance

    BI has done important work on the management and distribution of biomedical data in support of research projects. Many of these informatics projects that started at GHUCCTS BI have developed independent funding from the NIH and other HHS agencies.

    Several datasets are currently hosted, curated and managed by BI and available to the GHUCCTS community and investigators for research use. | View Datasets

    BI data managers also act in the role of an honest brokers to provide electronic medical records, registry and other patient data collected during clinical care and operations for research purposes this includes access to:

    • PHI for research purposes
    • Limited Data Sets
    • Cohort Discovery (aggregate number) or De-Identified data.

    Data Access Policies

    Select Peer-Reviewed Publications

    • Gusev Y; Bhuvaneshwar K, Song L, Zenklusen JC, Fine H, Madhavan S. “The REMBRANDT study, a large collection of genomic data from brain cancer patients.” (2018) Nature Scientific Data 5:180158. PMID 30106394
    • Pishvaian MJ, Blais EM, Brody JR, Lyons E, DeArbeloa P, Hendifar A, Mikhail S, Chung V, Sahai V, Sohal DPS, Bellakbira S, Thach D, Rahib L, Madhavan S, Matrisian LM, Petricoin EF 3rd. “Overall survival in patients with pancreatic cancer receiving matched therapies following molecular profiling: a retrospective analysis of the Know Your Tumor registry trial.” (2020) Lancet Oncol ;21(4):508-518. PMID 32135080
    • Wagner AH, Walsh B, Mayfield G, Tamborero D, Sonkin D, Krysiak K, Deu-Pons J, Duren RP, Gao J, McMurry J, Patterson S, Del Vecchio Fitz C, Pitel BA, Sezerman OU, Ellrott K, Warner JL, Rieke DT, Aittokallio T, Cerami E, Ritter DI, Schriml LM, Freimuth RR, Haendel M, Raca G, Madhavan S, Baudis M, Beckmann JS, Dienstmann R, Chakravarty D, Li XS, Mockus S, Elemento O, Schultz N, Lopez-Bigas N, Lawler M, Goecks J, Griffith M, Griffith OL, Margolin AA. “A harmonized meta-knowledgebase of clinical interpretations of somatic genomic variants in cancer.” (2020) Nat Genetics 52(4):448-457 PMID 32246132
    • Danos AM, Krysiak K, Barnell EK, Coffman AC, McMichael JF, Kiwala S, Spies NC, Sheta LM, Pema SP, Kujan L, Clark KA, Wollam AZ, Rao S, Ritter DI, Sonkin D, Raca G, Lin WH, Grisdale CJ, Kim RH, Wagner AH, Madhavan S, Griffith M, Griffith OL. “Standard operating procedure for curation and clinical interpretation of variants in cancer”. (2019) Genome Med. 11(1):76. PMID 31779674
    • McCoy MD, Shivakumar V, Nimmagadda S, Jafri MS, Madhavan S. “SNP2SIM: a modular workflow for standardizing molecular simulation and functional analysis of protein variants.” (2019) BMC Bioinformatics. 20(1):171. PMID 30943891
    • Kancherla J, Rao S, Bhuvaneshwar K, Riggins RB, Beckman RA, Madhavan S, Corrada Bravo H, Boca SM. “Evidence-Based Network Approach to Recommending Targeted Cancer Therapies.” (2020) JCO Clin Cancer Inform. 4:71-88.  PMID 31990579
  • Schedule Consultation

    Schedule Consultation

    Every Week on Tuesdays
    10:00 - 11:00am – Per request (Please contact us)

    Areas of Consultation

    • Next generation sequencing analysis
    • Molecular profiling analysis
    • BIG DATA analytics
    • Data Integration
    • Systems biology analysis / Network modeling and inference
    • Computational chemistry / Molecular modeling
    • Bioinformatics software development
    • Training in bioinformatics software
    • Clinical data management
    • Cohort discovery
    • G-DOC
    • REDCap

    GHUCCTS BI contact:
    To contact us, please submit a request here or email us at