Since Luther announced that it will offer a new major in Data Science starting next fall, I’ve been answering a lot of questions. What is data science? What does one do with a data science major? What does a data scientist do? Why study data science at Luther? These are great questions, and are all questions we thought about as we put together the major. In this post, I’ll try to answer and provide some context for these questions.
We are Awash in Data
In my own lifetime I've seen the personal computer go from 48 kilobytes (KB) of memory with external storage on a cassette tape. A few years later the first floppy drive stored about 48KB as well. In 1986 my wife and I purchased a 128k Macintosh as our wedding gift to each other. A few years later we bought our first hard drive. It was the size of a shoebox and held 20Mega Bytes (MB). It seemed like an infinite amount of data. Today my iPhone, a fraction of the size of that hard drive holds 128Giga Bytes of data. Back in the 90's we used to talk about "information overload." We spoke about the volume of information published on the internet as something like drinking water from a firehose instead of a straw. Today that metaphor seems quaint, maybe even antiquated.
The volume of data we generate every day is staggering. The amount of data that Google alone stores in its servers is estimated to be 15 exabytes, that is 15 followed by 18 zeros! For those of you that remember punch cards you can visualize 15 exabytes as a pile of cards three miles high covering all of New England! Everywhere you go, someone or something is collecting data about you: what you buy, what you read, where you eat, where you say, how you drive your car.
What does it all mean?
Often this data is collected and stored with little idea about how to use it. I myself am guilty of that with my own research. Other times the data is collected quite intentionally. The big question is “What does it all mean?” That is where data science comes in. Data Science is an emerging field that brings together ideas that have been around for years, or even centuries. Most people define data science as "an interdisciplinary field about processes and systems to extract knowledge or insights from data in various forms"
Data Science in a Liberal Arts Context
As an interdisciplinary field of inquiry data science is perfect for a liberal arts college. Combining statistics, computer science, writing, art, and ethics data science has application across the curriculum: biology, economics, management, english, history, even music. The best thing about data science is the job of a data scientist seems perfectly suited to many of our students.
The best data scientists have one thing in common: unbelievable curiosity. - D.J. Patil Chief Data Scientist of the United States.
According to Eric Haller, VP of Experian, a global information services company, recently interviewed by the Chicago Tribune.
A data scientist is an explorer, scientist, and analyst all combined into one role. They have the curiosity and passion of an explorer for jumping into new problems, new dta sets and new technologies. They love going where no man has gone before in taking on a new approach to taking on age old challenges or coming up with an approach for a very new problem where nobody has tried to solve it in the past.
They can write their own code and develop their own algorithms. They can keep up with the scientific breakthrough of the day and regularly apply them to their own work. And as an analyst they have a penchant for detail, continually diving deeper to find answers. Finding treasure in the data, analysis and the details give them an adrenaline rush.
Our data scientists tend to operate with a noble purpose of trying to do good things for people, businesses and society with data.
However, all of this exploration and analysis means nothing if you cannot communicate it to people. In a recent Harvard Business Review article by Jeff Bladt and Bob Filbin entitled: A Data Scientist's Real Job: Storytelling, they elaborate
Using Big Data successfully requires human translation and context whether it's for your staff or the people your organization is trying to reach. Without a human frame, like photos or words that make emotion salient, data will only confuse, and certainly won't lead to smart organizational behavior. 1
Stories are great, but in data science you better make sure they are true, especially when you are dealing with stories about numbers. In a recent article entitled The Ethical Data Scientist, the sub-title really tells the story: People have too much trust in numbers to be intrinsically objective. The better known phrase is that “Statistics don’t lie, but statisticians sometimes do.” The challenge for the data scientist is to avoid the trap of choosing the statistics that only tell the story they want to tell.
The ethical data scientist would strive to improve the world, not repeat it. That would mean deploying tools to explicitly construct fair processes. As long as our world is not perfect, and as long as data is being collected on that world, we will not be building models that are improvements on our past unless we specifically set out to do so.
As the current semester winds down, and I begin to prioritize my tasks for the Summer, and think about Fall class preparation, I’m really excited about teaching two new courses next fall, and a bit daunted by the responsibility of building the foundation for Luther’s new data science program.