Building a Data Science Portfolio: A Newcomer’s Guide
The point of building a data science portfolio is to demonstrate your skills to prospective employers. As someone with six years of experience hiring data professionals for my own consultancy firm, and who’s worked with 10% of Fortune 100 companies on data projects, I know a thing or two about what businesses are looking for when it comes to hiring data workers. So before going into too much detail about how to build your portfolio, let’s identify what kind of skills these prospects are actually looking for.
Make sure to stick around until the end if you want to learn what types of projects to showcase, the best places to publish your portfolio, and three examples of brilliant data portfolios (plus a special bonus shortcut to get noticed FAST as a data professional!).
A Newcomer’s Guide to Building a Data Science Portfolio
Let’s be real. When it comes down to it, prospective employers are looking to hire data scientists who generate monetary value by either reducing wasteful spending or increasing revenues. That’s why you want to make sure your CV is value-driven, and not just the normal litany of the 100s of skills you’ve acquired over the course of your career.
If you want to learn more about showcasing your value by creating profitable data projects for companies, you’ll want to check out my latest product for ambitious data professions, Winning With Data. By learning data strategy and leadership skills, you’ll be able to reach the next rung on the data science career ladder faster than if you simply stick to data implementation skills.
Although it’s not always easy to showcase the value you’ll add solely through your data science portfolio, you can highlight your valuable expertise and data science skills. While each individual prospective employer is looking for something a little different, the good news is that there are some fundamental skills common to most data science roles. Those are:
- Programming in Python and/or R
- Data munging
- Predictive modeling
- SQL experience
- Data storytelling
- Personality attributes: Team-player, problem-solver, and tenacious
Just by taking the time to publish a coding portfolio, you’re showing potential employers that you’re committed and passionate about the data science field. This dedication helps to demonstrate you have the personality attributes that prospective employers are looking for.
Are you a newbie data professional and want more info on breaking into the data science industry? This post was originally published in my free e-book, A Badass’s Guide to Breaking into Data. Get the 52-page guide getting started in the data professions by either getting a data job or starting your own small consultancy here.
Deciding where to publish your portfolio
When it comes to building a data science portfolio, there are a variety of good options on where to go to publish your work. Personally, I prefer to publish Jupyter Notebooks on GitHub for Python and RPubs for R code. You can, of course, also publish your code to Kaggle.
The next option is to publish your portfolio on your blog or website, along with some explanation on the concepts you’re demonstrating. Doing this allows you to show off your technical communication skills. People who can communicate technical concepts in plain-language are highly sought-after. You can use your blog and coding portfolio as a place to practice this through writing and videos.
As far as publishing code to your blog, that’s made easy by using embedded viewers. The embeddable viewer for Jupyter Notebooks is called nbviewer, and for R is RPubs (here are the instructions for that).
You might be wondering which option is better for you – publishing your portfolio on Github or creating your own site to showcase your work. I’ll outline a few pros and cons to each option.
Creating Your Portfolio on Github
There are several advantages to publishing your portfolio on GitHub, as well as a few downsides.
- Github is the industry standard. All prospective employers will be familiar with it and understand how the interface works.
- It is relatively quick and easy to set up your projects on Github – you won’t need to spend hours setting up a site.
- There is no cost to hosting your projects on GitHub.
- Because it is so common to display work on Github, potential employers may not be quite as impressed with your efforts.
- Displaying data visually is not as easy on Github. While Github is great for code snippets, data visualizations like slideshows, charts, and data-storytelling are sometimes better showcased on a dedicated portfolio site.
- You must remain active on the platform and continue to put effort into your presence there. A sparse Github profile with months of inactivity does not send a good signal to potential employers. Once you’ve created your portfolio, engage with other users, go back, and update old projects, contribute to open source, etc.
Creating Your Portfolio on a Dedicated Portfolio Site
[inlinetweet]Perhaps you’re thinking you’d rather take your data career up a notch by creating your own dedicated data science portfolio site. Before you take the time to do so, read this.[/inlinetweet]
- Shows employers your dedication and willingness to “go the extra mile” as not all candidates will take the time to set up their own blog or website.
- It’s easier to visually display your work and go beyond coding snippets. This is especially valuable if you are a networking event or conference and want to quickly pull up a project to showcase your skills and talents.
- In addition to portfolio projects, you can add a blog, videos, slideshows, and other pieces of content to showcase your expertise.
- While many employers will be impressed with your website, you may have some who would prefer to simply check out your Github account.
- It will take more time and effort to set up your own dedicated portfolio site. There is an upfront time investment.
- There may be costs to self-hosting your website.
In my opinion, the best option is actually to do both. [inlinetweet]For demonstrating your data expertise it is best to first build a data science portfolio on Github that shows off your coding skills. Then, as an added bonus, create a visually appealing site to showcase your data projects that also works to highlight your technical communication skills.[/inlinetweet]
Deciding what to publish when building a data science portfolio
Ultimately you want to be building a data science portfolio that concisely demonstrates your ability to carry out all of the data science tasks that’ll be required of you. To that end, I’d consider building a data science portfolio that shows people you know how to do:
- Data munging – In other words, show people how to clean, restructure, and reformat raw data into the form you need for use in modeling and analysis.
- Describing and inferring – Use statistical methods to describe and make inferences from your cleaned datasets.
- Data showcasing and story-telling – Here is where you show your proficiency at communicating data insights to different types of audiences.
- Predictive modeling and machine learning – Demonstrate how you’re able to use machine learning methods to make predictions (hopefully predictions that are relevant to business).
You can put these all together piecemeal, or build an end-to-end project that walks through each of the important components. The latter is probably the better bet.
I teach many of the above topics in my LinkedIn Learning classes. If you are a student of mine, showcasing your course learnings and projects is an excellent way to beef up your portfolio!
Some excellent examples to inspire your portfolio
When you’re building a data science portfolio, it’s always nice to look at some examples for some inspiration. I have been quite impressed and inspired by the following data science portfolios:
Donne Martin is a Software Engineer at Facebook with a focus on data privacy. He’s also an avid open-source contributor, having created and contributed to many popular projects on Github. You can view his portfolio site or check out his LinkedIn profile.
Why his portfolio stuck out to me: Donne Martin has made thousands of open-source contributions to GitHub over the last few years alone. That alone is enough to merit his inclusion within this post!
Sebastian Raschka is an assistant professor of statistics at the University of Wisconsin-Madison, with a focus on machine learning and deep learning. He is also the author of the textbook, “Python Machine Learning”. Take a moment to look through his portfolio, Github account, and LinkedIn profile.
Why his portfolio stuck out to me: Sebastian Raschka’s online coding demos have literally changed the trajectory of many data professional’s careers. If you want to learn to do data science, you actually don’t need to take courses… Just learn along with Raschka’s coding demos and book (although that would be the slower, harder way of going about things)!
Jake Vanderplas software engineer at Google on the Colab team, as well as a developer on a number of open-source Python projects. View his projects on Github or his site, or check out his LinkedIn profile here.
Why his portfolio stuck out to me: Well 10k followers on Github is not bad ???? but the real reason I admire Vanderplas’s GitHub portfolio is because of the sentiment behind it. When you look over his contributions, it is clear to see that he is driven by a desire to help data professionals and the data industry at-large – making them more accessible to people around the world, without any monetary gains for himself (although he does get paid for his work at Google so, there’s that). He takes time in his explanations and makes them all freely available on GitHub along with all the code to support his free demos.
You may notice that I left myself off the list. If you’re wondering, “What about you Lillian? Where’s your data science portfolio?” Well, as a matter of fact, I pretty much use my Lynda and LinkedIn Learning courses as a coding portfolio, in which to date I’ve used to train over one million data professionals in data science worldwide. Although I have published some demos on GitHub, RPubs, and my blog, I’ve been so busy with paid work, I haven’t had the time or interest to do more.
And while we’re on the topic of paid work, let me point out that paid data work is my goal for you too. My entire purpose of this recent series of blog posts was to hopefully get you moving in the right direction that you, too, can find your way into the same position as me (the position of having so many opportunities for paid work, that you no longer have the time that’s required for building a data science portfolio). I’m so grateful to have had the chance to serve so many incredible businesses inside the work I do at Data-Mania, including companies like:
Now, if you prefer to skip all of the long, hard years of building a portfolio and working like crazy to master each and every new data implementation technique, there’s good news. You can use data to create the impact you crave by becoming a data leader instead.
With my new product, Data Strategy Action Plan, I teach you the essential data strategy skills you need to quickly reach the next rung on the data science career ladder.
Data Strategy Action Plan is a step-by-step checklist & collaborative Trello Board planner for data professionals who want to get unstuck & up-leveled into their next promotion by delivering a fail-proof data strategy plan for their data projects.
GET THE DATA STRATEGY ACTION PLAN FOR JUST $37 HERE.