GHUCCTS Programs & Resources

Health Informatics (HI)

Learn more about Health Informatics (HI).

What's It All About
The GHUCCTS Health Informatics (HI) core supports the development of tools, datasets, and analysis methods to help drive clinical and translational research across GHUCCTS institutions. The core has also established informatics education and training programs to train the next generation of informatics leaders.

HI includes faculty and staff members from the Innovation Center for Biomedical Informatics (ICBI) and MedStar Health Research Institute (MHRI), has developed into a hub for research in health informatics, educates the next generation of informaticians and computational data scientists including a Master of Science program in Health Informatics and Data Science at Georgetown University.

Train our community populations in AI and Machine Learning in collaboration with other GHUCCTS members, MedStar Health and Howard University through NIH’s AIM-AHEAD Program and by providing real world data sets.

Strategic goals for the HI component are to bring innovation to GHUCCTS and the national CTSA program in the following ways:
- Drive personalized medicine research within the DC area underserved population through innovative academic/private partnerships and unprecedented access to protected health data including molecular profiling and clinical outcomes.
- Helping to establish an integrated and collaborative national network of de-identified genetic, imaging, clinical, economic, environmental, behavioral, and patient reported data from our affiliated institutions, which can be accessed by translational researchers.
- Train the next generation of Data Scientists.
- Organize GHUCCTS datasets into industry standards.
- Build powerful clinical-omics databases and contribute to national standards to drive regulatory science.
- Provide tools to curate, organize, rank, and validate evidence for disease-biomarker associations from literature, clinical trial repositories, and other sources.
Who is Responsible
Co-Director: Peter McGarvey, PhD

Dr. Peter McGarvey is Director of ICBI and founding Co-Director for the MedStar – Georgetown Collaborative Center for Artificial Intelligence in Healthcare Research and Education (AI CoLab) and Research Professor in the Department of Biochemistry and Molecular & Cellular Biology, Georgetown University Medical Center (GUMC). He has academic and commercial experience in bioinformatics, software development, biotechnology, and molecular biology. His research interests include genomic and proteomic analysis, biological databases, AI/ML, and data visualization. Dr. McGarvey is part of several national and international consortia including NCI’s Clinical Proteomics Tumor Analysis Consortiums - Proteomics Data Cloud and Assay Portal, the UniProt Knowledgebase, and the Clinical Genome Resource. Dr. McGarvey has a PhD in Biological Sciences from the University of Michigan and MS in Technology Management from University of Maryland University College.

Co-Director: Nawar Shara, PhD

Dr. Shara is Chief, Research Data Science for the MedStar Health Research Institute. In this role, and founding Co-Director for the MedStar – Georgetown Collaborative Center for Artificial Intelligence in Healthcare Research and Education. A seasoned biostatistician, Dr. Shara also serves as an Associate Professor of Medicine at Georgetown University School of Medicine and director of the Biostatistics, Epidemiology and Research Design (BERD). Dr. Shara has more than 18 years’ experience overseeing statistical and data management activities for large, multi-center clinical trials, and has established herself as an expert in the field with numerous NIH-funded projects and federal research awards.

Co-Director of the IT Infrastructure Task: Adil Alaoui, MS, MBA

Adil Alaoui provides leadership, integrated management, and strategic direction to align innovative healthcare requirements with technical solutions. Mr. Alaoui is responsible for the design and implementation of Innovative Health Information Exchange approaches to integrate disparate systems and maintain heterogeneous data in secure research data repositories. He also directs the development of tools and complex technology solutions in support of GUMC’s systems medicine vision and leads the development and implementation of efficient and effective Health Information Technology Solutions. He is the lead IT architect for GHUCCTS. Prior to joining ICBI, Mr. Alaoui led the development and successful implementation of several global Telehealth and Teleradiology projects sponsored by the National Library of Medicine (NLM), DoD and Department of State. Mr. Alaoui is director of the capstone program, Health Informatics and Data Science MS and an adjunct lecturer at the Nursing and Health Sciences School at Georgetown University.

Additional Team Members
- Kanchi Krishnamurthy (Clinical Informatics)
- Yuriy Gusev, PhD (Senior Bioinformatician, Education)
- Shuo Wang, MS (Clinical Database Analyst)
- Camelia Bencheqroun, MS. (Senior Software Engineer)
- Samir Gupta, PhD. (Data Scientist)
- Yili Zhang, PhD. (Postdoc fellow)
Tell Me More
Precision Medicine Informatics

1. ClinGen: HI faculty are leading and organizing community data standards for curation of genetic variations within the NIH-funded ClinGen program, to augment precision medicine research. Informatics research has included:
- Novel standard called Minimal Variant Level Data (MVLD) to guide reporting of clinical lab test results for molecular diagnostics;
- Natural Language Processing tools to automate the extraction of variation, treatments and outcome relationships for diseases from the biomedical literature;
- a computational approach for selection of therapies targeting drug resistant variation that won the Marco Ramoni award at AMIA
- a workflow for molecular simulation and functional analysis of protein variants; and,
- a Network Approach to Recommending Targeted Cancer Therapies.
2. CPTAC: The Clinical Proteomic Tumor Analysis Consortium (CPTAC) of NCI’s Office of Cancer Clinical Proteomics Research is a comprehensive and coordinated effort to accelerate the understanding of the molecular basis of cancer through the application of proteomic technologies and workflows to tumor samples with characterized genomic and transcript profiles. The combination of proteomics, transcriptomics, and genomics data from the same tumor samples provides an unprecedented opportunity for tumor proteogenomics as illustrated by several consortium wide high-profile Nature and Cell publications.

Read More

Advanced Bioinformatics Tools and Research Platforms:

HI develops innovative scientific software to enable translational research. Our projects include muti-omics data analysis, vaccine safety research, clinical data analysis, high definition data visualization, natural language processing, and mobile application development. Some of our open science projects include:

G-DOC: Georgetown Database of Cancer (G-DOC) is a precision medicine platform that enables the integrative analysis of multiple data types to understand disease mechanisms, biomarker discovery, data management and education.

Virtual Research Environment (VRE): VRE is a secure cloud platform for research and education. BI took on the task of creating a virtual research environment (VRE) in the cloud leveraging the Google Cloud Platform (GCP) for provisioning computing resources, securely storing and sharing data. VRE was designed and developed to overcome barriers met by the research community while complying with institutions’ policies and current state and federal policies and regulations. VRE is being developed to become GHUCCTS’ preferred and recommended secure cloud service for research, education and data sharing needs. When used appropriately, VRE can also be used to store, manage, and analyze electronic protected health information (ePHI).

VRE also enables our programs and investigators to participate in and share de-identified / standardized data with the community and research networks. VRE is a multi-mission platform that can facilitate the advancement of science, education, and services.

Clinical Study Data Collection and Surveys: REDCap (Research Electronic Data Capture) is a research tool developed at Vanderbilt University as a secure web application to allow users to build and manage online surveys and databases, and to support data capture for research studies.

Read More

Health Data Science

HI faculty and staff conduct research using public and proprietary datasets to advance Health Data Science and Precision Medicine with the goal of deriving actionable knowledge from genomics, electronic health records, registries, patient-reported, public health and other datasets some of which include:
- Immuno Oncology Registry: a centralized research data warehouse for ImmunoOncology that is enabling novel hypothesis generation and retrospective outcomes research at the 10 DC-Baltimore based MedStar Health network hospitals.
- Pediatric cancer outcomes registry: A database of pediatric cancer patients that were diagnosed with various cancers at Lombardi Cancer Center’s Pediatric Oncology Program and were enrolled or treated as per Children’s Oncology Group (COG) protocols between 1990 and 2014.
- Rembrandt brain tumor registry: REMBRANDT includes genomic data from 261 samples of glioblastoma, 170 of astrocytoma, 86 tissues of oligodendroglioma, and a number that are mixed or of an unknown subclass. Outcomes data include more than 13,000 data points.
- VA suicide ideation project: Using novel approaches to extract acoustic and semantic features from audio interviews to predict suicidal tendencies in military veterans. A classifier was built that differentiates suicidal from non-suicidal veterans based on acoustic features of speech and sentiment analysis of transcribed narratives.
Read More

Data Science Challenges

Some of our active and recent challenges:
1. PrecisionFDA Challenge: PrecisionFDA and HI launched the Brain Cancer Predictive Modeling and Biomarker Discovery Challenge-This challenge asks participants to develop machine learning and/or artificial intelligence models to identify biomarkers and predict patient outcomes using gene expression, DNA copy number, and clinical data.
2. COVID-19 Data Visualization Challenge: This initiative strives to generate insights that could lead to a better understanding of impact of physical distancing and the outbreak. We aim to bring amazing talent to work on the data and generate insights that can benefit the global community’s work to understand and control the spread of this pandemic.
Educational Activities

In 2019, we launched a Master of Science program in Health Informatics & Data Science (HIDS). HIDS is an accelerated, career-ready program, focused on current and emerging technologies. Students will gain competency in health data science, big data analytics, artificial intelligence and machine learning applications. The curriculum aligns with the core competencies in medical informatics defined by of the American Medical Informatics Association. HIDS is an industry-driven program, focused on current and emerging technologies that will inform healthcare and is well poised to create a pipeline of top talent of students and trainees for GHUCCTS research as well as educate the next generation of leaders in informatics to transformation healthcare.

SDOH Data Analytics:

In coordination with the AIM-AHEAD consortium, the HI team members have designed and developed a comprehensive set of core competencies and resources related to Social Determinants of Health (SDOH). These resources are accessible to the GHUCCTS community for training, research, and development. The collaborative effort, led by a multidisciplinary team, includes features that align with current research trends and leverages data science, health informatics and healthcare expertise. The workshops aim to educate a diverse audience, including scholars, researchers, and analysts, through hands-on sessions and access to data, tools, and methods. Coming soon….

AI for Health Care Applications:

HI team published a series of self-contained python notebooks, each with an accompanying recorded tutorial and example datasets. The notebooks provide a written narrative of the python libraries that are used to clean/build training sets, define AI model architecture, and evaluate model performance. Help sessions with some live presentations were also be provided.

Access to Research Datasets and Data Governance

HI has done important work on the management and distribution of biomedical data in support of research projects. Many of these informatics projects that started at GHUCCTS BI have developed independent funding from the NIH and other HHS agencies.

Several datasets are currently hosted, curated and managed by BI and available to the GHUCCTS community and investigators for research use. | View Datasets

HI data managers also act in the role of an honest brokers to provide electronic medical records, registry and other patient data collected during clinical care and operations for research purposes this includes access to:
- PHI for research purposes
- Limited Data Sets
- Cohort Discovery (aggregate number) or De-Identified data.
Data Access Policies
Services & Resources
Schedule Consultation

Every Week on Tuesdays
10:00 - 11:00am – Per request (Please contact us)

Areas of Consultation
- Next generation sequencing analysis
- Molecular profiling analysis
- BIG DATA analytics
- Data Integration
- Systems biology analysis / Network modeling and inference
- Computational chemistry / Molecular modeling
- Bioinformatics software development
- Training in bioinformatics software
- Clinical data management
- Cohort discovery
- G-DOC
- REDCap
GHUCCTS HI contact:
To contact us, please submit a request here.

GHUCCTS Programs & Resources

Health Informatics (HI)

Additional Team Members

Areas of Consultation