You are currently viewing Harvard Extension School Introduction to Statistics and Applied Data Analysis (STAT S-100) Course Review
Tom G Herman Harvard STAT S-100 Class

Harvard Extension School Introduction to Statistics and Applied Data Analysis (STAT S-100) Course Review

This was my first ever course through Harvard Extension School, and my first four credit university-type course since I graduated from the University of Colorado in 2006 with a double major in economics and finance. This was also my first course in pursuit of a Data Analytics Graduate Certificate from Harvard Extension School. The Data Analytics Graduate Certificate requires 16 credits and a combination of four courses: An optional “intro” course (if needed to meet prerequisites), a required statistics course (STAT E-109, Introduction to Statistical Modeling), and two elective courses. Since the STAT S-100 course is required to take STAT E-109, I took STAT S-100 in the summer of 2024 as my “intro course,” with the plan to take CSCI E-96 (elective course – Data Mining for Business) in fall 2024, followed by STAT E-109 in spring 2025.

And whoooooo boyyyyy, this course was a doozy! I expected this to be challenging and fast-paced, but little did I know what I was signing myself up for. Since I took this class during the summer, it was compressed into only six weeks, whereas a typical spring or fall class would be spread out over 16 weeks, allowing much more breathing room and time to actually absorb the material. There were definitely points where I wondered if I’d make it out of here with at least a B (thankfully I exceeded expectations).

This compressed timeframe meant there were two lectures per week, at three hours each. There were also “optional” one-hour sections later in the day to go over the lab and section exercises. I put “optional” in quotes because I found these to be critical to my success. These sections often went over key concepts that showed up again in the problem sets that were due each week.

I found the problem sets to be extremely difficult, and it was necessary to attend office hours pretty regularly to get help. While the concepts made sense in class and section, there were usually little twists that really stretched my brain. The midterm and final exams were in a similar format to the problem sets, except the topics were more varied and there were more problems to complete.

The main topics covered by the course were probability, distributions of random variables, inference, P-values, regression, multiple linear regression, logistic regression and classification. The concepts of inference and regression continue to build on each other as the course progresses, all while using the R programming language throughout.

Overall, the instructors were top-notch and and delivered the lectures in a very structured way. There was also a head teaching facilitator (TF) supported by half a dozen additional TFs who maintained a regular schedule of office hours (and even additional time slots from time to time). These office hours were extremely helpful and significantly helped me to be successful in this course. There was also a small “participation” component to the course where you would read an article of your choice and give a writeup on Slack with your thoughts, what you found interesting and how it related to what we were learning in the course. The course Slack channel was pretty active and the TFs were quick to respond to questions.

There were some recommended (but not required) textbooks that I purchased. In hindsight they were not useful because you can get everything you need directly from the course. Also the books are not printed in color which is a major drawback when you’re trying to understand concepts like clustering and probability (where things were obviously color coded in the original prints and online).

Overall this was a surprisingly challenging course but I’m glad I got it out of the way quickly during the summer. I learned a ton and much of what I learned was directly applicable to my next class, Data Mining for Business. I would highly recommend this course to anyone trying to beef up their statistics and R programming skills and/or pursuing a Graduate Certificate like myself.

Here are some of my keys for success that I would suggest to anyone thinking about taking this course:

  • Think carefully if you want to take this during summer session. If you already have a full time job and other responsibilities like myself, this class will be like drinking from a fire hose. The advantage is you’ll get through it quickly and you can accelerate your timeline toward your Graduate Certificate or Masters degree.
  • Take an introductory R class. I did some introductory R programming before this class, but even more would have helped. You will need to know how to wrangle data, perform exploratory data analysis (EDA) and build models. Also, most of the future classes you take will likely use R, so you’re just helping your future self by learning it now.
  • Do all the labs and section exercises. These will directly help you solve the weekly problem sets. They will also help you formulate more intelligent questions for the TFs in office hours if you get stuck (which you will — constantly).
  • Go to office hours! This helped me big time. Most of the TFs were very friendly and eager to help. Look up their scheduled office hours and make it a point to attend. Even if you don’t have an immediate question, other students might ask questions that lead to “Aha” moments for you. Some TFs were even available by appointment for 1×1 sessions.
  • Consider skipping the textbooks if they’re not required (especially during the summer). I hardly used them and couldn’t keep up with the reading toward the end of the semester anyway.
  • Have faith and just do the work. This class was fast-paced, stressful and made me question myself at times. As long as you attend the lectures and put in the time and work, you’ll be fine.

Good luck!

– Tom