Biomedical Informatics

  1. What's It All About?
  2. Who is Responsible?
  3. Tell Me More.
  4. Get Support.

Health Mobile Apps

Gulf War - Complementary and Alternative Medicine (GWCAM):

Current mobile technologies achieved a maturity level and with a ubiquitous presence that makes the variety of devices suitable for the future of healthcare and patient health improvement. ICBI is developing mobile health applications that include a DoD funded project to help the VA evaluate the efficacy of a Complementary and Alternative Medicine (CAM) for Sleep, Health Functioning, and Quality of Life intervention (iRest® Yoga Nidra with auricular acupuncture) for increasing health-related functioning and specific symptoms (fatigue, pain, cognitive deficit, sleep disturbance).


Massive Open Online Course MOOC

Recent advances in biotechnology and genomics have led to the generation of massive amounts of molecular profiling  data on an unprecedented scale of many petabytes per year. The challenge is to facilitate the comprehension and analysis of big datasets and make them more “user friendly”. With these goals in mind we have developed a novel, Massive Open Online Course (MOOC), that aims to facilitate the understanding, analysis, and interpretation of biomedical big data for non-computational students, basic and clinical scientists, researchers, and librarians, with limited or no experience in bioinformatics. 

The 8-week course was funded by an NIH- BD2k R25 grant and was released in February 2017 on the edX platform. The course covered biomedical big data as it relates to five main areas: Genomics; Transcriptomics; Proteomics; Systems Biology; and Big Data applications in translational research. Course is structured in 8 weekly modules that consist of short video lectures, interviews, and online demos. The course is focused mainly on demos and hands on training.  Students are provided with the opportunity to follow the demos and perform additional exercises on-line in order to obtain a hands-on experience with the use of different types of genomics data. 

Over 6,600 students from 129 countries have enrolled within the first eight months and provided positive feedback on their experiences with the course. An average age of students was 29 years and more than half of students had advanced degrees at Masters level or higher. We plan to maintain this course as a “living resource,” updating it regularly, and keeping it freely accessible. We believe that this will allow us to provide an educational opportunity to a large audience worldwide, particularly to individuals with limited access to traditional educational resources in this cutting-edge field. Our course could be accessed for free at EdX website:


The Clinical Genome Resource (ClinGen) is a National Institutes of Health (NIH)-funded resource dedicated to building an authoritative central resource that defines the clinical relevance of genes and variants for use in precision medicine and research. ClinGen is comprised of several expert working groups that help support its key goals and overall mission. Scientists and biocurators from ICBI are involved in the ClinGen Somatic Working Group (WG), a multi-institution team engaged in curation of cancer variants for clinical utility. It consists of five different task-teams performing variant curation through implementation of our Minimum Variant Level Data (MVLD) structure. Pediatric: This team will initially focus on variant curation in ~31 genes implicated in childhood tumors; an ongoing 10-gene pilot is scheduled for completion by mid-2018. Many cancer genes such as WT1 are presented in oncology databases without pediatric-specific curations. The pediatric team will collaborate with the Clinical Interpretations of Variants in Cancer (CIViC) database to enhance search and delivery via pediatric-specific tagging. Pancreatic: This team's focus is pancreatic ductal adenocarcinoma (PDAC). The team has identified ~5473 unique variants from 432 genes using the PANCAN Know Your Tumor resource. Roughly 38% of these genes and only ~1-2% of individual variants are present in CIViC database, highlighting the scale and diversity of somatic mutation in PDAC. Non-Small Cell Lung Cancer and Somatic TP53: These teams are actively developing variant-sets and progressing in variant curation for lung cancer and the TP53 gene. The TP53 team also plans to harmonize with germline variant curation efforts. Curation SOPs: This team has adopted the Association for Molecular Pathology (AMP) interpretation guidelines for variant classification, and is defining curation workflows and protocols using CIViC and ClinVar as the curation and knowledge dissemination platforms.


In order to standardize the collection of clinically relevant somatic data, the Somatic Working Group of ClinGen created a framework of consensus data elements titled "Minimum Variant Level Data” (MVLD) (Ritter DI, et al., Genome Medicine 2016). MVLD was developed with input from multiple stakeholders ranging from database engineers to researchers and somatic clinical laboratory directors, as well as input from multiple current databases that collect cancer variant data. Briefly, MVLD consists of three sections: allele descriptive, allele interpretive and somatic interpretive. The allele descriptive section contains data elements that describe the genome position, gene, chromosome, genomic location, reference transcript and protein. The allele interpretive section contains data elements describing the somatic classification (confirmed somatic, confirmed germline or unknown), the DNA and protein substitution, the variant type and consequence and PubMed identifiers associated with interpretation.

The somatic interpretive section contains the most clinically relevant data, and is the section that required the most discussion and consensus-building among the working group members. The somatic interpretive section contains a description of the cancer type (NCI Thesaurus, Oncotree, Disease Ontology), the Biomarker Class (Diagnostic, Prognostic, Predictive), the Therapeutic Context (associated drugs), Effect (Resistant, Responsive, Not-Responsive, Sensitive, Reduced-Sensitivity), Level of Evidence (a tiered system similar to the recent AMP/CAP/ASCO guidelines, Li, M.M., et al., J Mol Diagn, 2017)  and Sub-Level of Evidence (reporting of trials, metadata analysis, preclinical data or inferential data).

MACE2K (Molecular and Clinical Extraction to Knowledge)

MACE2K is a software tool to automatically extract information and visualize it in a value added manner to can help clinicians and clinical researchers assess the overall evidence associated with biomarkers that predict response to cancer therapies. In order to do this, we first developed a natural language processing (NLP) tool called eGARD (Mahmood et al., PLoS One, 2017) by extending and repurposing multiple in-house and public text mining systems. The tool can detect different data elements including cancer types, gene/protein names, SNPs, mutations, expression, copy number variations, therapies and disease outcome terms from PubMed abstracts. Entity relationships that indicate the predictive effect of genomic anomalies on therapeutic outcomes can also be detected from abstracts. The NLP tool produces output in JSON format to facilitate data exchange and integration of text mining results for the expert curator and end user interfaces. All the data is stored in a database and cognitive systems analysis methods will be applied to optimize user interface design.

The use of data wrangling approaches to organize and evaluate dispersed public data and associated metadata from biomarker driven studies into MACE2K will enable researchers to readily generate hypotheses for new precision medicine based clinical trials.

MS program in Health Informatics and Data Science

Healthcare has been generating and collecting huge amounts of data from multiple sources that include electronic health record systems, medical imaging, lab data and genomics testing. At the same time, tools and methods are being developed and implemented to improve our ability to better leverage, analyze, understand and act on the vast amount of health data. The intersection of these trends is helping healthcare to become more efficient and provide insights that can better patient outcomes.

We are developing a Masters’ program that is industry driven and focused on current and emerging technologies and concepts that will inform healthcare by leveraging data science, big data, Artificial Intelligence and Machine Learning applications to achieve Precision Medicine and value based care.

The program is scheduled to launch in Fall 2019 and will offer courses in health data science, medical informatics, EHR mining, big data analytics, mobile health, data commons and governance, human factors engineering and safety, leadership in health informatics, and a mandatory capstone/internship with industry and/or the Govt.

BI services:

The BI Component provides Bioinformatics & Clinical Informatics Support and coordinates the efforts of data stewards across GHUCCTS organizations to implement data standardization, systems interoperability, SOPs for data access, and governance to enable the exchange of data among various sources. BI core provides secure and standardized access to over 4 million patient records from our participating institutions. We have HIPAA certified data coordinators trained in PHI data extraction from EHRs supporting investigators on a routine basis. Additional BI services include:

  • Understand Investigators’ data and computing needs and provide expert technical support and solutions
  • Consult on a wide range of Informatics solutions and cutting edge technologies
  • Collaborate and partner with investigators to advance research

Personalized Medicine (PM) is a rapidly growing new area the GHUCCTS community is committed to exploring due to the tremendous networks developed during the first three years of the CTSA award. PM researchers use genetic screening and diagnostic tests to reduce harms and improve health outcomes; the challenge is knowing when and how to apply these tests so that they represent a clinically sound and cost-effective use of resources. There is an urgent need to use emerging big data from molecular diagnostics, electronic health records (EHR), claims and policy databases to better understand clinical utility, cost-effectiveness, and reimbursement for personalized therapies. Numerous studies indicate that PM will significantly lower costs, improve medication adherence, enhance quality of life.

The BI team enabled access to GHUCCTS investigators to over 4 million HIPAA protected patient records in 2013. This program referred to as Patient Data Access and Cohort Discovery provides researchers with access to multiple clinical sources, including EHR/EMRs, lab results, patient registries and demographic data. The BI team is standardizing patient data and providing data access for qualified researchers to power national studies in cancer, neurology, and other biomedically related research fields.

Application of clinical genomics to personalized medicine

A primary goal of our PM effort is to support research on pharmacogenomics data, which involves the analysis of how different people respond to particular therapies based on their molecular differences. With the rapid advancement of gene variants and genomics findings it is feasible to generate a personalized pharmacogenomics report on the individual patient level. This can be used to predict drug efficacy for different patients, as well as to create a pharmacogenomic characterization of clinical trial participants, which promises to dramatically enhance clinical trials design and increase trial success rates. We are developing tools and visualizations that will help to translate genotype information from genotyping arrays, whole genome, and whole exome studies into personalized, clinically actionable pharmacogenomic information that will enhance patient outcomes and safety. Knowledge of the effects of genomic variation on drug sensitivity, resistance, efficacy, and adverse events has the potential to transform healthcare and truly enable the personalized/precision medicine revolution.