Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors

Tag: Big Data

Creating databases to help cure diseases worldwide

Jessica Kissinger poses for a photo in the Infectious Diseases Institute in Uganda where she is currently a US Fullbright Scholar. (Photo/Courtesy Jessica Kissinger)

Jessica Kissinger is using her expertise in biology and big data to help other scientists

 

Jessica Kissinger never set out to make databases. From the time she was a little girl, she wanted to be a biologist.

Today, the University of Georgia professor not only studies deadly pathogens like malaria and Cryptosporidium (a waterborne parasite), but also is a driving force behind worldwide, groundbreaking collaborations on novel databases. During her time at UGA, she has received nearly $40 million in federal and private grants and contracts.

These databases can crunch vast amounts of biological information at warpspeed and reveal important patterns that pave the way for new approaches to scourges such as Leishmania (common in the tropics, subtropics, and southern Europe), toxoplasmosis (a systemic disease due to one of the world’s most common parasites), and Valley Fever (a fungus born on the wind that can cause lung and systemic infections). Novel drug and vaccine targets can be developed, as well as fresh insights on life-threatening pathogens.

“Fighting infections and developing new drug and vaccine targets requires detailed knowledge of a pathogen and how it functions,” explained Kissinger, a Distinguished Research Professor in UGA’s Department of Genetics, Institute of Bioinformatics and Center for Tropical and Emerging Global Diseases.

And, like internet searches, the databases are all free. Kissinger said it’s likely that pharmaceutical companies are mining some of the information in their quest to discover new therapeutic targets.

“They don’t tell us what they’re working on,” she said. “A database itself doesn’t produce a cure. A database can, however, remove most barriers to analysis of existing data.”

Big Data paves the way for big advances in science

It once took an entire decade to sequence a single genome—and the cost was many millions. Today, researchers can sequence a genome in a single afternoon for a few thousand dollars, transforming the field of genomics. Similar astounding advances have reshaped other ‘omics’ specialties, such as proteomics (study of proteins), metabolomics (study of metabolism), transcriptomics (study of RNA), and epigenomics (the influence of the environment on gene function). These advances mark the “Big Data” era in biology.

“The power that is unleashed by big data is phenomenal,” said Kissinger, “and it’s a very exciting time in history, with major funders and visionaries all across the world forming consortia to create a kind of ideal data universe.” Like explorers trekking into a new world, they will make discoveries we might only imagine right now.

Creating a malaria database

Kissinger’s innovations began over 23 years ago, while she was a postdoctoral researcher at the University of Pennsylvania studying a single-celled parasite called Toxoplasma gondii. The parasite shares some important features with the malaria pathogen, whose genome was in the process of being sequenced.

“I rounded up genome data from all over the world on Plasmodium (the causative agent of malaria), and ran analyses and put it on a website, so I could study the genes it might share with Toxoplasma,” she recalled. “It turns out nobody had made the Plasmodium data available for searching before.”

Soon she and her adviser, David Roos, had a million-dollar grant to formally establish a malaria database, PlasmoDB, and since its launch in 1999 it has grown to include additional pathogens and received continual funding from the NIH, the most recent for up to $38.4 million to maintain what has now become the Eukaryotic Pathogen, Vector and Host Informatics Resources knowledgebase (VEuPathDB), covering 14 different pathogens as well as host responses to infections. This comprehensive database is an integrated centralized resource for data mining on over 500 organisms.

The databases collectively contain over nine terabytes (9,000 gigabytes) of data, and have been compared to a Wikipedia for molecular parasitology by the British Society for Parasitology, which noted back in 2006: “We don’t know what we would do without it!”

Each month, VEuPathDB receives over 11 million hits from an average of 36,000 unique visitors in more than 100 countries, including India, Brazil and Kenya. A related database on vectors of disease (such as ticks that carry Lyme disease) was recently merged into VEuPathDB. The merger expanded each resource and enables researchers to better explore data on vectors such as ticks and mosquitoes and the pathogens they transmit.

Powerful tools are key to analyzing data

The databases are not just strings of numbers or words. They allow visualizations and graphic interfaces. Already, research is emerging that can help direct vaccine and drug development away from proteins that hosts and pathogens share, in order to protect the cell. Scientists using the databases have discovered proteins that reduce severe malaria and other proteins that protect malaria parasites from the human fever response. They have also found proteins that help Toxoplasma penetrate host cells.

In a single year an average of 200 publications a month cite VEuPathDB, and to date there have already been 24,000 citations total. Next up: cloud-ready applications and improved integration with yet other databases. These databases “have become essential data mining and access platforms for fungal and parasite genomics research,” said microbiologist and plant pathologist Jason Stajich of the University of California at Riverside.

“Without powerful, user-friendly tools to analyze it, “Big Data” is more a curse than a blessing,” explained John Boothroyd, an immunologist and microbiologist at Stanford University School of Medicine. “VEuPathDB is just such a tool and we owe Jessica Kissinger and her colleagues an enormous thank you for their tireless and selfless efforts to first conceive and then continuously improve this absolutely essential resource.”

Grants for related projects have come from a wide array of organizations, among them the Bill & Melinda Gates Foundation, the Sloan Foundation, and the World Health Organization. One of those projects, called ClinEpiDB, is home to a multicenter study that contains data from over 22,000 children from seven different sites in South Asia and Africa. This study is the largest ever to investigate the causes of diarrhea in children in lower- to middle-income countries. Other uses of ClinEpiDB include new data on hidden signs of malaria transmission in areas where incidence is declining, or how breastfeeding protects infants from common infections.

The VEuPathDB database would be enough to secure Kissinger’s reputation in the biological sciences, but she has not stopped there. At the University of Georgia, she was a founding member of the Institute of Bioinformatics, and served as its director from 2011 to 2109. The Institute’s mission is to facilitate cutting-edge interdisciplinary research in computational biology, and the program offers both masters and doctorates. She is a key researcher helping to partner a national hub for infectious disease research by linking with Emory University in Atlanta. The two institutions have grants totaling over $45 million to work on everything from tuberculosis to HIV to malaria.

“These databases are a success beyond my wildest dreams,” said Kissinger. “They are made by biologists for other biologists and address a real-life need.”

 

This story first appeared at UGA Today.

Database offers tool for global health collaborations

 

As the big data revolution continues to evolve, access to data that cut across many disciplines becomes increasingly valuable. In the field of public health, one barrier to sharing data is the need for users to fully comprehend complex methodological details and data variables in order to properly conduct analyses.

The Clinical Epidemiology Database, ClinEpiDB.org, aims to address these barriers by not only providing access to huge volumes of data, but also providing tools to help interpret complex global epidemiologic research studies. The development of ClinEpiDB has been led by the University of Georgia’s Institute of Bioinformatics, University of Pennsylvania’s School of Arts and Sciences and its Perelman School of Medicine, and the University of Liverpool’s Institute of Integrative Biology.

On March 7, ClinEpiDB released data, methodology and documentation from “The Etiology, Risk Factors, and Interactions of Enteric Infections and Malnutrition and the Consequences for Child Health and Development” (MAL-ED) study. The MAL-ED study represents a nearly decade-long research collaboration between the Foundation for the National Institutes of Health (FNIH), Fogarty International Center, and an international network of investigators.

The MAL-ED study was designed to help identify environmental exposures early in a child’s life that are associated with shortfalls in physical growth, cognitive development, and immunity. The study characterizes gut function biomarkers on the causal pathway from environmental exposure to growth and development deficits and assesses diversity across geographic locations with respect to exposures and child health and development. The MAL-ED consortium has published a significant library of peer-reviewed publications and ClinEpiDB now makes the MAL-ED data highly visible and accessible in new and exciting ways.

“It is great to see how investments and effort directed at data being Findable, Accessible, Interoperable and Reusable—i.e., F.A.I.R—are beginning to bear fruit,” said Jessica Kissinger, UGA Distinguished Research Professor of Genetics and co-principal investigator on the Bill & Melinda Gates Foundation award that funded the ClinEPi Development. “Too many important studies are buried in the scientific or medical literature and not easily accessible or reusable in moving the frontier in the important battles related to infectious disease and human health. This multi-institutional, multiple-funder, interdisciplinary approach is working.”

ClinEpiDB is also home to the Global Enteric Multicenter Study (GEMS) which contains data from more than 22,000 children from seven sites in South Asia and Africa and was the largest-ever study to investigate the causes to moderate-to-severe diarrheal illness in children in lower- to middle-income countries. The most recent ClinEpiDB release also contains data from GEMS1A, a continuation of the GEMS study that broadened its scope to include less-severe diarrheal episodes. The addition of MAL-ED adds to the growing resource of high-quality maternal and child global health data.

“Over 10 years, our international network of investigators collaborated through MAL-ED to better understand the complicated relationships among intestinal infections, nutrition and other environmental exposures on child development,” said Michael Gottlieb, FNIH deputy director of science (retired) and lead PI for the MAL-ED study. “The MAL-ED Network generated a high-quality data set, possibly the largest of its kind, on various research areas from cognitive abilities to gut function to immunological response. We are pleased to make this dataset available through ClinEpiDB so it can be used by researchers far into the future to increase scientific understanding, test new research hypotheses and design and implement better intervention strategies to reduce childhood morbidity and mortality.”

MAL-ED sites (located in Iquitos, Peru; Fortaleza, Brazil; Haydom, Tanzania; Limpopo, South Africa; Bhaktapur, Nepal; Naushero Feroze, Pakistan; Vellore, India; Dhaka, Bangladesh) allowed for comparisons to be made among and between children living in geographically and culturally diverse urban and rural environments and in countries at different levels of economic development.

MAL-ED data in ClinEpiDB account for over 1.3 million observations covering anthropometrics, nutrition, vaccination status, diarrheal and respiratory disease episodes and countless other details collected by community field workers in 2009-2014. The current release includes longitudinal data from children followed two times a week for the first 24 months of life.

Future data releases will contain data for some children up to 5 years of age. ClinEpiDB allows users to walk through these data easily via an intuitive interface, enabling point-and-click filtering, simple queries and more complex “search strategies.”

See https://youtu.be/535PcFrBH8M for a video introduction to this resource. ClinEpiDB will continue to grow and provide increased access to malaria and maternal and child health global datasets thus facilitating epidemiologic research in an open data environment while protecting patient identity.

Data boost: Using big data to fight disease

Jessica Kissinger

From leisure to health, digital databases can streamline nearly every facet of modern life.

Remember when making travel plans to a single destination took hours? Now booking flights, hotels and rental cars is just a few clicks—and a credit card—away thanks to travel sites like Expedia, Travelocity and others. Travelers get to compare competitors on price, amenities, customer reviews and proximity to popular locations. The sites pull together multiple data points from various sources (such as pricing from the seller, reviews from users, and maps from Google) and organize them for customers to view.

Jessica Kissinger, the director of UGA’s Institute for Bioinformatics and member of the Center for Tropical and Emerging Global Diseases, is doing for infectious disease research what travel sites did for vacation planning.

All over the world, researchers are racing to stop the spread of deadly and debilitating pathogens such as malaria. As those researchers and public health officials determine, or record data about a disease, Kissinger and her colleagues work to make that data accessible and searchable by the global research community for free.

“We take data generated by others and make them better,” says Kissinger, a Distinguished Research Professor of Genetics. More specifically, Kissinger and a team of cell biologists, geneticists and computer scientists pull disease data from a variety of sources, translate them into standard formats and make them searchable.

 

Jessica Kissinger, the director of UGA’s Institute for Bioinformatics, is doing for infectious disease research what travel sites did for vacation planning.

Enabling Discovery

How does building a database fight disease? Data help researchers construct and test their ideas about how to create treatments for diseases or map out ways to halt their spread.

“We don’t give them answers,” Kissinger says. “We give them a framework in which to generate and test hypotheses.”

Kissinger and her team have built databases to take on malaria and other infectious diseases such as toxoplasmosis, cryptosporidiosis and trypanosomiasis. They are also creating tools for studying childhood malnutrition and factors related to disease, and making them accessible to all as they become publicly available. These databases collectively service more than 70,000 unique users a month from more than 100 countries.

To put it simply, her work saves time. It speeds the pace discovery for the next possible solution, the next cure. Without these databases, researchers could spend weeks, months, even years researching existing literature on a disease in the library or recreating work in the lab.

Jessica Kissinger with student

These are tools by biologist for biologists … I think it is that sense of being a member of that community, having your finger on the pulse of what’s going on, that allows you to keep the tools useful. ~ Jessie Kissinger Director, Institute of Bioinformatics

Career Evolution

Kissinger didn’t set out to build databases.

She was trained as a molecular evolutionary biologist, not a computer scientist. “I like to see how molecules change over time,” she says. “When I started in school it was about how a gene or protein evolved.”

It turned out that her field was evolving too. Technology was allowing researchers to understand molecules through bigger data sets. Now, scientists aren’t just looking at individual genes but entire genomes, which are the complete sets of genes in a cell or organism.

As the field evolved, Kissinger learned and embraced the technology. Over time, she shifted her balance away from the so called “wet lab,” where she worked directly with the organisms, to focus mostly on the computer-based “dry lab.”

Her database work started with malaria and continued to expand.

“Now we make 10 different component databases for over 300 organisms (EuPathDB.org), a comparative database to see how conserved genes are across organisms and a new epidemiology database to study the prevalence, spread and factors related to disease in humans (ClinEpiDB.org).” she says.

Kissinger’s team relies on an expert advisory board that helps the researchers customize the databases for each disease community, so they have the largest impact on research. It helps that Kissinger started in a wet lab before diving into informatics.

“These are tools by biologist for biologists,” she says. “We have a lot of computer scientists in the middle, but I think it is that sense of being a member of that community, having your finger on the pulse of what’s going on, that allows you to keep the tools useful.”

 

SUPPORT OUR RESEARCH

Give to the Center Tropical & Emerging Global Diseases General Fund

[button size=’large’ style=” text=’Give Now’ icon=” icon_color=” link=’https://ctegd.uga.edu/give/’ target=’_self’ color=” hover_color=” border_color=” hover_border_color=” background_color=’#b80d32′ hover_background_color=” font_style=” font_weight=” text_align=’center’ margin=”]

 

Originally published at https://greatcommitments.uga.edu/story/data-boost/