High Performance Nearest Neighbor Queries with Hadoop-GIS
libCUDASP – A General Spatial Query Processing Library for GPU
Medical Vocabulary Generating Tool
TCIA Data Exploration and Information Visualization
Data Replication/Synchronization Tools
Mentor: Ashish Sharma ( ashish dot sharma at emory dot edu )
Overview: The Cancer Imaging Archive provides access to a wealth of biomedical cancer imaging data. It contains over 26 million radiology images, pathology data, and clinical data. Typically users download images to their local machines before analyzing the downloaded data. Over time, as new studies are uploaded, it becomes difficult to track which imaging studies have been downloaded by users. In this project you will propose and develop a system that can track what has been downloaded by a user, in response to a given query. Think of it as a one-way Google Drive/Dropbox (data always moves from server to client) where each folder is mapped to a particular query, and the contents of that folder are are frequently updated on the server side. Your client side solution would need to track of what has been downloaded and gives users the option of updating their collections. Your proposed solution can include extensions to the Java based web services that are used to create the REST API. Your client side application can be a cross-platform thick client or a desktop-based web application.
Programming Languages/Frameworks: Java, Other tools and languages are highly dependent on your proposed strategy.
Prerequisites: Experience in distributed computing and appropriate languages.
Level of Expertise: Intermediate
Auto generate query template from AIM templates (XML)
Mentor: Pattanasak Mongkolwat (p-mongkolwat at northwestern dot edu )
Overview: The Annotation and Imaging Markup (AIM) model provides a method for capturing structured assessments of biomedical imaging data. AIM-E is software which as been developed to store and query against this structured data. A key component of AIM is the ability to create “templates” which are used to ask a series of questions related to a research hypothesis. The goal of this project would be improve the AIM-E software to to build an automated set of queries which mirrors the questions found in the AIM template.
Programming Languages: Java, XML, XPath, JSON. Experience with OSGI would be a plus
Level of Expertise: Intermediate
Web-based UI for temporal query
Mentor: Himanshu Rathod (himanshu dot rathod at emory dot edu)
Overview: You will investigate novel user interface designs for identifying interesting patient populations for clinical research and healthcare analytics. Eureka! Clinical Analytics is a web-based software system that aims to break down the layers of IT that typically sit between electronic health record data and users of that data such as researchers and healthcare operations personnel. It aims to enable those users to define variables, computed from the source data, that are useful for their analytics or research task, an activity that typically is performed by IT intermediaries. These variables may be computed as patterns in temporal sequences and frequencies of clinical attributes (visit information, vital signs, diagnoses, etc.). These data transformation concepts are challenging to present to research and operations personnel in a web user interface.
This UI is a component of Eureka! Clinical Analytics, a federally funded web application for healthcare analytics. You can learn more about Eureka! at http://aiw.sourceforge.net.
Programming Language: Javascript, HTML, CSS, JQuery, JSP
Prerequisites: Javascript, HTML, CSS. JQuery, JSP a big plus.
Required skills – UI design.
Level of Expertise: Intermediate
Web-based source-to-target mapping UI
Mentor: Michel Mansour (michel dot mansour at emory dot edu)
Overview: You will investigate and prototype novel designs for a user interface for defining mappings from a source data model to a target data model. Source to target mapping is a key component of Eureka! Clinical Analytics, which aims in part to connect to enterprise data warehouses at a medical institution and support straightforward preparation of those data for research and operational analytics. We currently provide the source-to-target mappings functionality only in the form of Java code and some externalized configuration files. To facilitate adoption by hospital and biomedical research IT departments, we need an elegant UI that will allow data modelers (who are not programmers) to easily define source-to-target mappings that will make their enterprise data sources available through Eureka. This mapping UI is a component of Eureka! Clinical Analytics, a federally funded web application for healthcare analytics. You can learn more about Eureka! at http://aiw.sourceforge.net.
Programming Language: Java, JSP, Javascript, HTML, CSS.
Prerequisites: Java, Javascript, HTML, CSS. JSP a plus.
Required skills – UI design.
Level of Expertise: Advanced
Automate account creation for new users
Mentor: Himanshu Rathod (himanshu dot rathod at emory dot edu)
Overview: Eureka! Clinical Analytics is an open source web-based analytics application that provides user interfaces for new users to register for an account. Account creation is currently a manual, tedious, time-consuming, and error-prone activity. In this project, you will implement automated account creation that will run whenever a new user signs up. This code will hook into Eureka!'s existing infrastructure. It will require interaction with Eureka!'s own databases as well as with 3rd party software. New user registration is a component of Eureka! Clinical Analytics, a federally funded web application for healthcare analytics. You can learn more about Eureka! at http://aiw.sourceforge.net.
Programming Language: Java
Prerequisites: Java, XML. JDBC/JPA experience a plus.
Level of Expertise: Intermediate
Integrate Eureka! with a web-based statistical analysis and data mining platform
Mentor: Michel Mansour (michel dot mansour at emory dot edu)
Overview: You will extend healthcare data processing software to support straightforward analysis of its output using the R programming language (http://www.r-project.org/). Eureka! Clinical Analytics, our web-based clinical data processing software, provides sophisticated functionality for preparing electronic health record data for use in research and analytics. R is one of the most popular languages for statistical analysis and data analytics. We aim to create a web-based data analysis platform using a combination of Eureka and R. While Eureka supports outputting prepared data in various formats that can be consumed by R, it has no intrinsic integration with R or any other statistical analysis or data mining tool. You will make transferring prepared data from Eureka into R as easy as possible via selection of a web-based R solution, backend integration of Eureka and the selected R solution, and minor user interface extensions to invoke R on a prepared dataset. This project is an extension of Eureka! Clinical Analytics, a federally funded web application for healthcare analytics. You can learn more about Eureka! at http://aiw.sourceforge.net.
Programming Language: Java mostly, with some JSP, Javascript, HTML and CSS.
Prerequisites: Java, Javascript, HTML, CSS. JSP a plus.
Level of Expertise: Intermediate
Automate QA Process
Mentor: Michel Mansour (michel dot mansour at emory dot edu)
Overview: You will examine methods for and implement automated quality assurance of data for our software, Eureka! Clinical Analytics. Eureka performs complex transformations on large volumes of clinical data. We have reference datasets that we use to test the system. During every release cycle, we spend a lot of time verifying that the transformations' output are correct for each dataset. Since the data and transformations are well-defined, so is the output. We want you to build an automated system that, given a source dataset and expected output, computes whether the expected output and actual output are the same. If not, it should provide lots of detail about where the differences lie and what transformations may be producing incorrect output. This project is an extension of Eureka! Clinical Analytics, a federally funded web application for healthcare analytics. You can learn more about Eureka! at http://aiw.sourceforge.net.
Programming Language: Java, and a scripting language like Python, Ruby, etc.
Prerequisites: Java, database experience, comfortable with complex algorithms
Level of Expertise: Advanced
Data mining algorithms with NoSQL database
Mentor: Himanshu Rathod (himanshu dot rathod at emory dot edu)
Overview: We are currently working to export data from our application to NoSQL databases, to take advantage of the flexible schema. The nature of the exported data makes it suitable for a NoSQL graph database, such as Neo4J. You will explore and implement various graph algorithms within a NoSQL environment, to analyze the exported data. The aim of this effort is to provide researchers with an easy-to-use set of tools that can be used to gain a deeper understanding of the relationship between the data. The data to be analyzed is inserted into the graph database by the Eureka! Clinical Analytics application, a federally funded web application for healthcare analytics. You can learn more about Eureka! at http://aiw.sourceforge.net.
Programming Languages: Java
Prerequisites: Java. Familiarity with NoSQL databases and graph algorithms a plus.
Level of expertise: Intermediate