Caltech Home > PMA Home > News > Big Data Summer School Is in...
open search form

Big Data Summer School Is in Session—Virtually

Beginning September 2, Caltech and JPL will be offering an unusual take on the massive open online course (MOOC) model: a two-week-long "virtual summer school" class, providing advanced instruction by experts at Caltech and JPL on the computational skills and methods used in the analysis of complex data sets—that is, of "big data."

Why big data? "Science in the 21st century is becoming increasingly data-driven, and we need new tools for extracting knowledge from massive and complex data sets," says Caltech professor of astronomy George Djorgovski, one of the organizers of the summer school. "Our students and postdocs need to master such skills in order to be effective researchers today."

According to Richard Doyle, program manager of JPL's Information and Data Science Program Office and co-organizer of the course along with Djorgovski and JPL's Dan Crichton, "the challenges of distributed data analytics in the big data era are on the critical path to our future success in conceiving, designing, operating and, most importantly, extracting scientific results from NASA science missions. By joining with Caltech, we combine the intellectual strengths of a leading research institution with JPL's established science, engineering, and technology leadership in accomplishing NASA science missions."

"It is imperative that we begin now to educate our workforce on the nature of the challenges, along with the best available ideas for achieving technical solutions," adds Crichton, director of JPL's Center for Data Science and Technology. "We will be impressing on the students the importance of taking a full life-cycle approach to data-intensive science, from the point of data collection—which may be at Mars, Jupiter or beyond—to grappling with the daunting realities of massive, heterogeneous, highly distributed archived data sets to extract reproducible scientific understanding of Earth, astrophysical, and planetary data. These solutions can apply to many other important fields, such as medicine, health care, and bioinformatics."

"Caltech and JPL are starting a joint research venture in the arena of big data science, and this is our first joint educational offering," says Djorgovski. "It is fittingly both timely and innovative in its approach."

The course has a unique two-tiered format for student enrollment. The first tier consists of a group of 36 official students chosen from a pool of hundreds of applicants. The group includes graduate students, postdocs, and staff scientists from Caltech, JPL, and other institutions in the United States and around the world who already have a strong background in data-driven computing and statistics as well as research experience. Each weekday during the two-week term, these students will watch prerecorded video modules prepared by the course's 11 instructors and then perform hands-on computational exercises to practice what they have learned. Instructors will be available for interactive online sessions.

The second tier is for anyone, anywhere, who wants to take the course, free of charge (but for no credit), through the online learning platform Coursera. The Caltech-JPL Summer School on Big Data Analytics—the first professional summer class offered by Coursera—will be posted at the same time as the regular session, although these students will have no promise of instructor interaction. However, in a twist on the traditional MOOC, which is structured to match an actual classroom learning experience, students will be able to proceed entirely at their own pace. "You can sign up whenever you want. You can go through it at your own pace; take only some of it, or all of it," explains Djorgovski.

At the end of the two-week term, all of the developed content will migrate to Coursera's new On-Demand course platform.

"This is the first Caltech Coursera MOOC using this model, and it is new to Coursera, too," says Leslie Maxfield, director of Academic Media Technologies at Caltech, which supports the Institute's Coursera and edX online courses—now totaling eight in all, with more under development—in collaboration with the Center for Teaching, Learning, and Outreach. "Offering courses as on-demand allows students to fit online education into their busy schedules, and will hopefully increase completion rates," she says. (For a list of Caltech's MOOCs and links to registration for upcoming sessions, go to

Finally, in addition to being available indefinitely on Coursera as a stand-alone course, the summer school materials will be used in Djorgovski's spring 2015 Caltech course Methods of Computational Science, which will be offered as a MOOC and used for a "flipped" classroom approach. "A flipped classroom reverses, or 'flips,' when students passively and actively learn," Maxfield explains. "Instead of passively listening to a lecture during class time, students watch online pre-recorded videos and take instant-feedback assessments beforehand. This allows for active, in-class collaborative and creative interactions, such as group problem solving and discussions, directed by their professors."

For more information, visit the course website at

Written by Kathy Svitil