GHUCCTS Programs & Resources

Share This

Biomedical Informatics

Learn more about Biomedical Informatics (BI).

  • What's It All About

    The Biomedical Informatics (BI) component of GHUCCTS provides analytical and computational support with the goal of promoting clinical and translational research. Specific activities include support for common standards development, workflow tracking, molecular and clinical data integration, brokering secure access to de-identified patient data by physician researchers at GHUCCTS institutions, dissemination of the translational research data and results and support with all clinical trial management systems and tools.

    The BI team also collaborates with several Washington, DC area based institutions to promote health information exchange, collaboration and education. The GHUCCTS’ team uses several technologies, approaches and secure methods to share patient demographics and health indicators enabling the research enterprise and an ecosystem capable of scaling to collaborate with national research networks.

    Strategic goals for the BI component are to bring innovation to GHUCCTS and the national CTSA program in the following ways:

    • Drive personalized medicine research within the DC area based underserved population through innovative academic/private partnerships and unprecedented access to protected health data including molecular profiling and clinical outcomes.
    • Helping to establish an integrated and collaborative national network of de-identified genetic, imaging, clinical, economic, environmental, behavioral, and patient reported data from our affiliated institutions, which can be accessed by translational researchers.
    • Organize GHUCCTS datasets into industry standards and expose a standardized API for participation in CTSA and other national networks.
    • Build powerful clinical-omics databases and national standards such as the MVLD standard to drive regulatory science using enterprise class software development tools and best practices led by field experts.
    • Provide tools through the Georgetown Database of Cancer Plus (G-DOC) to curate, organize, rank, and validate evidence for disease-biomarker associations from literature, clinical trial repositories, and other Big Data sources.
    • Develop state of the art applications and ma
    • Work with professional societies and standards organizations to develop recommendations for precision medicine and disseminate consensus views.


    The BI component of GHUCCTS is comprised of multiple centers and collaborators including:

    The core organizes and sponsors regional annual biomedical informatics symposiums:

    1. Biomedical Informatics Symposium at Georgetown University
    2. Computational Biology and Informatics Symposium at Howard University
    3. MedStar Health Annual Research Symposium
  • Who is Responsible

    Headshot of Dr. Subha MadhavanDirector : Subha Madhavan, MS, PhD

    Dr. Madhavan is Director of Clinical Informatics at Georgetown University Medical Center. In this role, she has designed and built the Georgetown Database of Cancer (G-DOC), a cutting-edge data integration platform and integrative knowledge discovery system for the oncology and translational research communities. Prior to joining Georgetown, Dr. Madhavan served as the Associate Director of Product and Program Management in the Life sciences informatics area at NCI"s Center for Biomedical Informatics and Information Technology (CBIIT). She is an expert on many areas of biomedical informatics, and draws on a wealth of experience and talents.

    William Southerland, PhD headshotCo-Director : William Southerland, PhD

    Dr. William M. Southerland is a Professor of Biochemistry in the Howard University College of Medicine, the Program Director for the Howard University Research Centers in Minority Institutions (RCMI) Program and the Howard University Center for Computational Biology & Bioinformatics (CCBB). Dr. Southerland"s expertise is in the area of structural modeling, simulation of biomolecules with special emphasis on the interaction mechanisms of small molecules with both proteins and DNA.  Dr. Southerland"s interests also include translational bioinformatics.

    Headshot of GHUCCTS member Adil AlaouiCo-Director of the IT Infrastructure Task : Adil Alaoui, MS, MBA

    Adil Alaoui provides leadership, integrated management, and strategic direction to align innovative healthcare requirements with technical solutions. Mr. Alaoui is responsible for the design and implementation of Innovative Health Information Exchange approaches to integrate disparate systems and maintain heterogeneous data in secure research data repositories. He also directs the development of tools and complex technology solutions in support of GUMC’s systems medicine vision and leads the development and implementation of efficient and effective Health Information Technology Solutions. He is the lead IT architect for GHUCCTS. Prior to joining ICBI, Mr. Alaoui led the development and successful implementation of several global Telehealth and Teleradiology projects sponsored by the National Library of Medicine (NLM), DoD and Department of State. Mr. Alaoui is an adjunct lecturer at the Nursing and Health Sciences School at Georgetown University.

    Additional Team Members
    • Anas Belouali (Clinical Informatics)
    • Kanchi Krishnamurthy (Clinical Informatics)
    • Stephen Fernandez (Clinical Informatics)
    • Peter McGarvey, PhD (Bioinformatics Scientist, Translational Research)
    • Yuriy Gusev, PhD (Senior Bioinformatician, Education)
    • Krithika Bhuvaneshwar, MS (Research Associate, Data Analyst) 
  • Tell Me More

    Health Mobile Apps

    Gulf War - Complementary and Alternative Medicine (GWCAM)

    Current mobile technologies achieved a maturity level and with a ubiquitous presence that makes the variety of devices suitable for the future of healthcare and patient health improvement. ICBI is developing mobile health applications that include a DoD funded project to help the VA evaluate the efficacy of a Complementary and Alternative Medicine (CAM) for Sleep, Health Functioning, and Quality of Life intervention (iRest® Yoga Nidra with auricular acupuncture) for increasing health-related functioning and specific symptoms (fatigue, pain, cognitive deficit, sleep disturbance).

    Massive Open Online Course MOOC

    Recent advances in biotechnology and genomics have led to the generation of massive amounts of molecular profiling  data on an unprecedented scale of many petabytes per year. The challenge is to facilitate the comprehension and analysis of big datasets and make them more “user friendly”. With these goals in mind we have developed a novel, Massive Open Online Course (MOOC), that aims to facilitate the understanding, analysis, and interpretation of biomedical big data for non-computational students, basic and clinical scientists, researchers, and librarians, with limited or no experience in bioinformatics. 

    The 8-week course was funded by an NIH- BD2k R25 grant and was released in February 2017 on the edX platform. The course covered biomedical big data as it relates to five main areas: Genomics; Transcriptomics; Proteomics; Systems Biology; and Big Data applications in translational research. Course is structured in 8 weekly modules that consist of short video lectures, interviews, and online demos. The course is focused mainly on demos and hands on training.  Students are provided with the opportunity to follow the demos and perform additional exercises on-line in order to obtain a hands-on experience with the use of different types of genomics data. 

    Over 6,600 students from 129 countries have enrolled within the first eight months and provided positive feedback on their experiences with the course. An average age of students was 29 years and more than half of students had advanced degrees at Masters level or higher. We plan to maintain this course as a “living resource,” updating it regularly, and keeping it freely accessible. We believe that this will allow us to provide an educational opportunity to a large audience worldwide, particularly to individuals with limited access to traditional educational resources in this cutting-edge field. Our course could be accessed for free at EdX website:


    The Clinical Genome Resource (ClinGen) is a National Institutes of Health (NIH)-funded resource dedicated to building an authoritative central resource that defines the clinical relevance of genes and variants for use in precision medicine and research. ClinGen is comprised of several expert working groups that help support its key goals and overall mission. Scientists and biocurators from ICBI are involved in the ClinGen Somatic Working Group (WG), a multi-institution team engaged in curation of cancer variants for clinical utility. It consists of five different task-teams performing variant curation through implementation of our Minimum Variant Level Data (MVLD) structure. Pediatric: This team will initially focus on variant curation in ~31 genes implicated in childhood tumors; an ongoing 10-gene pilot is scheduled for completion by mid-2018. Many cancer genes such as WT1 are presented in oncology databases without pediatric-specific curations. The pediatric team will collaborate with the Clinical Interpretations of Variants in Cancer (CIViC) database to enhance search and delivery via pediatric-specific tagging. Pancreatic: This team's focus is pancreatic ductal adenocarcinoma (PDAC). The team has identified ~5473 unique variants from 432 genes using the PANCAN Know Your Tumor resource. Roughly 38% of these genes and only ~1-2% of individual variants are present in CIViC database, highlighting the scale and diversity of somatic mutation in PDAC. Non-Small Cell Lung Cancer and Somatic TP53: These teams are actively developing variant-sets and progressing in variant curation for lung cancer and the TP53 gene. The TP53 team also plans to harmonize with germline variant curation efforts. Curation SOPs: This team has adopted the Association for Molecular Pathology (AMP) interpretation guidelines for variant classification, and is defining curation workflows and protocols using CIViC and ClinVar as the curation and knowledge dissemination platforms.


    In order to standardize the collection of clinically relevant somatic data, the Somatic Working Group of ClinGen created a framework of consensus data elements titled "Minimum Variant Level Data” (MVLD) (Ritter DI, et al., Genome Medicine 2016). MVLD was developed with input from multiple stakeholders ranging from database engineers to researchers and somatic clinical laboratory directors, as well as input from multiple current databases that collect cancer variant data. Briefly, MVLD consists of three sections: allele descriptive, allele interpretive and somatic interpretive. The allele descriptive section contains data elements that describe the genome position, gene, chromosome, genomic location, reference transcript and protein. The allele interpretive section contains data elements describing the somatic classification (confirmed somatic, confirmed germline or unknown), the DNA and protein substitution, the variant type and consequence and PubMed identifiers associated with interpretation.

    The somatic interpretive section contains the most clinically relevant data, and is the section that required the most discussion and consensus-building among the working group members. The somatic interpretive section contains a description of the cancer type (NCI Thesaurus, Oncotree, Disease Ontology), the Biomarker Class (Diagnostic, Prognostic, Predictive), the Therapeutic Context (associated drugs), Effect (Resistant, Responsive, Not-Responsive, Sensitive, Reduced-Sensitivity), Level of Evidence (a tiered system similar to the recent AMP/CAP/ASCO guidelines, Li, M.M., et al., J Mol Diagn, 2017)  and Sub-Level of Evidence (reporting of trials, metadata analysis, preclinical data or inferential data).

    MACE2K (Molecular and Clinical Extraction to Knowledge)

    MACE2K is a software tool to automatically extract information and visualize it in a value added manner to can help clinicians and clinical researchers assess the overall evidence associated with biomarkers that predict response to cancer therapies. In order to do this, we first developed a natural language processing (NLP) tool called eGARD (Mahmood et al., PLoS One, 2017) by extending and repurposing multiple in-house and public text mining systems. The tool can detect different data elements including cancer types, gene/protein names, SNPs, mutations, expression, copy number variations, therapies and disease outcome terms from PubMed abstracts. Entity relationships that indicate the predictive effect of genomic anomalies on therapeutic outcomes can also be detected from abstracts. The NLP tool produces output in JSON format to facilitate data exchange and integration of text mining results for the expert curator and end user interfaces. All the data is stored in a database and cognitive systems analysis methods will be applied to optimize user interface design.

    The use of data wrangling approaches to organize and evaluate dispersed public data and associated metadata from biomarker driven studies into MACE2K will enable researchers to readily generate hypotheses for new precision medicine based clinical trials.

    MS program in Health Informatics and Data Science

    Healthcare has been generating and collecting huge amounts of data from multiple sources that include electronic health record systems, medical imaging, lab data and genomics testing. At the same time, tools and methods are being developed and implemented to improve our ability to better leverage, analyze, understand and act on the vast amount of health data. The intersection of these trends is helping healthcare to become more efficient and provide insights that can better patient outcomes.

    We are developing a Masters’ program that is industry driven and focused on current and emerging technologies and concepts that will inform healthcare by leveraging data science, big data, Artificial Intelligence and Machine Learning applications to achieve Precision Medicine and value based care.

    The program is scheduled to launch in Fall 2019 and will offer courses in health data science, medical informatics, EHR mining, big data analytics, mobile health, data commons and governance, human factors engineering and safety, leadership in health informatics, and a mandatory capstone/internship with industry and/or the Govt.

    BI services

    The BI Component provides Bioinformatics & Clinical Informatics Support and coordinates the efforts of data stewards across GHUCCTS organizations to implement data standardization, systems interoperability, SOPs for data access, and governance to enable the exchange of data among various sources. BI core provides secure and standardized access to over 4 million patient records from our participating institutions. We have HIPAA certified data coordinators trained in PHI data extraction from EHRs supporting investigators on a routine basis. Additional BI services include:

    • Understand Investigators’ data and computing needs and provide expert technical support and solutions
    • Consult on a wide range of Informatics solutions and cutting edge technologies
    • Collaborate and partner with investigators to advance research

    Personalized Medicine (PM) is a rapidly growing new area the GHUCCTS community is committed to exploring due to the tremendous networks developed during the first three years of the CTSA award. PM researchers use genetic screening and diagnostic tests to reduce harms and improve health outcomes; the challenge is knowing when and how to apply these tests so that they represent a clinically sound and cost-effective use of resources. There is an urgent need to use emerging big data from molecular diagnostics, electronic health records (EHR), claims and policy databases to better understand clinical utility, cost-effectiveness, and reimbursement for personalized therapies. Numerous studies indicate that PM will significantly lower costs, improve medication adherence, enhance quality of life.

    The BI team enabled access to GHUCCTS investigators to over 4 million HIPAA protected patient records in 2013. This program referred to as Patient Data Access and Cohort Discovery provides researchers with access to multiple clinical sources, including EHR/EMRs, lab results, patient registries and demographic data. The BI team is standardizing patient data and providing data access for qualified researchers to power national studies in cancer, neurology, and other biomedically related research fields.

    Application of clinical genomics to personalized medicine

    A primary goal of our PM effort is to support research on pharmacogenomics data, which involves the analysis of how different people respond to particular therapies based on their molecular differences. With the rapid advancement of gene variants and genomics findings it is feasible to generate a personalized pharmacogenomics report on the individual patient level. This can be used to predict drug efficacy for different patients, as well as to create a pharmacogenomic characterization of clinical trial participants, which promises to dramatically enhance clinical trials design and increase trial success rates. We are developing tools and visualizations that will help to translate genotype information from genotyping arrays, whole genome, and whole exome studies into personalized, clinically actionable pharmacogenomic information that will enhance patient outcomes and safety. Knowledge of the effects of genomic variation on drug sensitivity, resistance, efficacy, and adverse events has the potential to transform healthcare and truly enable the personalized/precision medicine revolution.

  • Schedule Consultation

    Schedule Consultation

    Every Week on Tuesdays
    10:00 - 11:00am – Per request (Please contact us)

    Areas of Consultation

    • Next generation sequencing analysis
    • Molecular profiling analysis
    • BIG DATA analytics
    • Data Integration
    • Systems biology analysis / Network modeling and inference
    • Computational chemistry / Molecular modeling
    • Bioinformatics software development
    • Training in bioinformatics software
    • Clinical data management
    • Cohort discovery
    • G-DOC
    • REDCap

    GHUCCTS BI contact:
    To contact us, please submit a request here or email us at