Approaching a dataset with visualization

I’d like to give my students some simple guidelines for how to use data visualization to look at a new dataset. What to do first, second, and so on. Here’s what I’m going to suggest.

Examine individual variables

First, take one variable at a time. Which are the most important ones, considering the audience and the purpose of your work? What are the mean, median, and mode? Accordingly, your first visualizations may be histograms or box-and-whisker plots, maybe Pareto diagrams. These go beyond the statistics by showing us the overall “shape” of the distributions, revealing things like Normal distributions, skewness, and fat or thin tails.

Compare subsets of the data on single variables

Once you have a sense of how the data is distributed overall, you can begin slicing and dicing it by some categorical dimension(s). This can be as simple as a bar chart comparing a single...

Continue reading →

Sep 20, 2017

Five reasons to try API-first development

I confess that I’m a data geek. When I first discovered relational databases in this tutorial by Jay Greenspan over 15 years ago, I began to see the world in third normal form. When I learned about RESTful web services from Michael Mahemoff’s Ajax Design Patterns, I started to think about applications in terms of requests and responses. I love discovering well-crafted JSON web services and showing them to my students, and so the idea of creating a public API to let other people obtain data from one of my databases or applications is not at all alien to me.

But it never occurred to me to build the API first.

I got the idea while developing a data engineering class. I wanted to emphasize the point to my students that analytics is about building data products for others to use and that, before we got into the nitty-gritty of how to use various types of databases, they should be able to...

Continue reading →

Sep 20, 2017

Querying the data

This is the third (draft) chapter of a new book on relational databases (using Postgres) that I’m working on as a side project. Stay tuned for additional chapters. The book under development can also be viewed at Leanpub, which supports commenting, and also will allow me to bundle the book with video lectures. I appreciate your feedback!

Sorry the tables and LaTeX equations don’t work in Svbtle… check out the e-book to see how they’re supposed to look.

A declarative query language

You have already seen some of the Structured Query Language (SQL) which is used to express queries in Postgres (and every other relational database that I know of) and you’re going to see a lot more in this book’s chapters. You have “programmed” several queries but here’s one thing you may not know: SQL is not a programming language. A computer program written in a language like Python, Java, or C++ is...

Continue reading →

Dec 17, 2016

The relational model

This is the second (draft) chapter of a new book on relational databases (using Postgres) that I’m working on as a side project. Stay tuned for additional chapters. The book under development can also be viewed at Leanpub, which supports commenting, and also will allow me to bundle the book with video lectures. I appreciate your feedback!

There are other data models

For a couple of decades (roughly 1985-2005), the relational data model was the only game in town, you had to learn it, and there was no reason for a textbook to argue the point. Today, data engineers have a lot of other options. Document-oriented databases are booming in popularity with app developers for their ease of use; graph databases have captured the imagination of researchers and tinkerers because of their natural fit with social network applications; and the analytics world has found performance advantages to...

Continue reading →

Sep 29, 2016

The Purpose of Excel

At lunch today, I was telling a colleague that in my “Introduction to MIS” course at UMaine, because the students will take a full-semester Excel course later, I have tried to demonstrate the business purpose for the software rather than the nuts-and-bolts of how to make it go.

His reply was, “that would be a great name for a course!” So today I’m thinking about how I might teach The Purpose of Excel as a university course. It may not be a course, but I’d bet I could come up with a few good chapters or essays on it.

So what is the purpose of Excel?

Excel is a great tool that can be used for everything from simple calculations (a substitute for a calculator) to graphic design (anything consisting of rectangles). Its business purpose though is to aid people in making decisions by creating simple models, which might better be called simulations, of business scenarios, and enabling...

Continue reading →

May 24, 2016

How databases fit in

This is the first (draft) chapter of a new book on relational databases (using Postgres) that I’m working on as a summer project. Stay tuned for additional chapters. The book under development can also be viewed at Leanpub, which supports commenting, and also will allow me to bundle the book with video lectures. I appreciate your feedback!

Introduction

Imagine that you work in a small direct-response mail order company that takes orders from customers by phone. Each agent in the call center downstairs has a stack of paper order forms on his desk, and when he receives an order he writes down the product name(s), quantity ordered, and the customer’s address and payment information. He uses a calculator or computer to sum up the order total, and tells the customer how much they’ll be charged.

Periodically, a data entry worker visits the call center and picks up stacks of filled-in...

Continue reading →

Nov 19, 2015

Understanding the “M” in MVC: a database nerd tries to learn how SQLAlchemy ORM fits in with Flask

Having cut my teeth as a web developer in the bad old days of spaghetti code (when PHP was the innovative new thing!), I came back to it after several years away and have discovered with delight the new species of web framework they call MVC—model, view controller.

My favorite so far is Flask, a Python-based microframework. Unlike the leading Python framework (Django), it’s a minimalistic framework that doesn’t make many decisions for you. I like to go slow and figure things out on my own, so that’s perfect for me. A simple Flask application is structured like so:

from flask import Flask
app = Flask(__name__)

@app.route("/")
def hello():
    return "<h1>Hello World!<h1>"

@app.route('/user/<name>')
def user(name):
    return "<h1>Hello, %s!</h1>" % name

if __name__ == "__main__":
    app.run()

What’s great about this is that instead of having PHP or some other kind of code...

Continue reading →

Nov 4, 2015

Is science design? Prototyping in academic research.

I have argued that design is a science, with particular reference to information systems development. There is a philosophy and a body of theory on how to design software, data, and socio-technical information systems in organizations. But what if the output we are working toward is scientific knowledge? I believe, but have not yet proven, that we can treat science itself as a design activity. In the spring of 2016, my capstone class will take part in an experiment to figure out just what this might mean.

What kind of problem is science

Horst Rittel and Melvin Webber of U C Berkeley articulated the concept of “wicked problems” in a noteworthy 1973 paper. Their purpose in using this term was to argue that problems like public policy cannot be solved by simply hiring experts to apply professional knowledge (“science”). Problems of social policy were different, they argued, because...

Continue reading →

Oct 21, 2015

Design is a science.

Philosophy of the artificial

For many centuries, there has been a distinction made between natural science on the one hand and engineering or applied science on the other. The natural sciences, physics, chemistry, biology, etc., are branches of philosophy—the pursuit of truth. Along with the humanities, the natural sciences were recognized as important parts of a liberal education and enjoyed great respectability among academics. More recently, the social sciences such as psychology and sociology have taken their place in academia.

By contrast, the various applied and professional disciplines have been seen as non-philosophical, concerned with usefulness rather than truth. Although engineering, medicine, business, law, education, architecture and similar problem-solving disciplines did convey useful bodies of knowledge, it was hard to see any “theory” in them. After all...

Continue reading →

Oct 16, 2015

A capstone course in information systems

An education for problem solvers

In an undergraduate information systems degree program, we strive to prepare students to be valuable employees, effective managers, and capable entrepreneurs using information technology in business. This is not to say that they should be exclusively focused on commerce. Everyone who is trying to accomplish anything in the world—entrepreneurs, government, nonprofits, artists—has “business”. Being a part of the business school distinguishes information systems (MIS, CIS, whatever) from other programs like computer science, not because we have a different domain of applications, but because business students are trained to pay attention to problems that need solving, rather than “solutions looking for problems”.

We ought to teach students how to approach problem solving with IT, not which problems are worth solving. Jeff Hammerbacher, one of the...

Continue reading →

Joseph Clark

MIS lecturer at UMaine. Fascinated by craftsmanship in products and systems.

Read this first

Approaching a dataset with visualization

Examine individual variables

Compare subsets of the data on single variables

Five reasons to try API-first development

Querying the data

A declarative query language

The relational model

There are other data models

The Purpose of Excel

So what is the purpose of Excel?

How databases fit in

Introduction

Understanding the “M” in MVC: a database nerd tries to learn how SQLAlchemy ORM fits in with Flask

Is science design? Prototyping in academic research.

What kind of problem is science

Design is a science.

Philosophy of the artificial

A capstone course in information systems

An education for problem solvers