Virtualisation | Cloud | Strategy

Extending Simple Network Mapping – TCP Port Descriptions

Simple Network Mapping

In an earlier post I demonstrated how you could use a very simple toolset such as VMware tools and Netstat to create visualisations for what devices are talking to each other on a network.  I had a few thoughts of my own on some additional simple steps that could be taken to improve the data and visualisation still further. A few conversations on Twitter and LinkedIn, convinced me to pull them together and write some blogs about it.

So thoughts that I have;

  • Matching the TCP ports in use to known services and applications
  • Integrate ARP data
  • Matching mac addresses and adding hardware vendors
  • Scheduling and scaling collection

If you enjoy what you’re reading and have some suggestions for how to extend it feel free to get in touch via Twitter

Where are we picking up from?

In place we have a visualisation that is referencing netstat and DNS data from the subnet in the lab.

The various elements above are;  the main visualisation is reserved for Network Navigator showing the relationships between nodes, 5 slicers on the right hand side for data interrogation the last section displays the raw data.

The Goal

I’m going to start with the first bullet listed above,  matching the TCP ports in use to known services and applications.  The benefit of doing this is that it allows the removal of some specialist knowledge required to gain value from the visualisation.  For example someone might well know a service by name and not by its port number.  A port number is just that a number, the description names the service which is much easier to link to an activity or business process.

TCP 1433 might not mean anything ‘Microsoft SQL Server database management system (MSSQL) server’ is much more meaningful.

In order to do this I’ll need to take the following steps

  1. Find a reputable source of TCP port information
  2. Import and transform the data
  3. Merge with our existing data set for local and foreign ports
  4. Edit the visualisation

Step one is relatively straight forward, The Internet Assigned Numbers Authority (IANA) is responsible for maintaining the official assignments of port numbers for specific uses. So that was my first port of call.  However, whilst browsing I found the following wikipedia page ‘List of TCP and UDP port numbers‘.  The data on this page was already in a nice format suitable to be used in PowerBI.

Import and transform data

PowerBI can import data from many sources including the html hosted on the internet, I’m going to take advantage of this.

Having selected the source data I’m prompted to select what I’m interested in on the page;

As you can see the listed tables follow the table names from the website, and as this is wikipedia the html tables have a meaningful name that makes this very simple to work with.  I’m interested in the tables ‘Registered Ports’ and ‘Well-Known Ports’, between them these tables provide information for TCP ports from 0 to 49151.  Also it should be said that as this is wikipedia, this data will be dynamic and update when we refresh the data connection.  This comes with the risk that the formatting might change, but that seems less likely for a data set and page such as this.

After importing both tables that we’re interested in, I end up with the following;

What I’m going to end up with is this;

You’ll notice that I’ve got two columns for port number, this is because the data set bundles some ranges together.  Examples of this are found for ‘Nintendo Wi-FI connection’ on port range 29900-29901 and ‘Steam (downloads)’ 27015-27030.  It a little annoying to work with, because in an ideal world I’d have a row per port and it is also mixing text and numeric characters in just a numeric field.  What I’ve done to make the data set easier to work with is use the ‘-‘ as a delimiter to split the column.  This provides values for the start and end of the port range, which I can work with and if needed we can build logic later to check if a value falls within those ranges.

Reference:  If you store data in a table do think about the data type that is going to stored, is it numeric (whole, decimal, percentage), a date or time, a true/false or text field.  What do you want to integrate the data set with? Thinking and planing this out ahead of time will save effort and/or disappointment later on.

To prove the point, I didn’t really think about this in the previous post and ended up modifying the ‘Local’ and ‘Foreign Port’ columns in my output table, so that it was easier to merge it with the port information from Wikipedia.

Again rather than step by step with screen shots, here is the query code I used for transforming the registered and well known port data from Wikipedia.

All this is really doing apart from the additional columns for port ranges, is stripping out information that I’m not really interested in such as non-standard ports etc.

We’ve now got two data sets from wikipedia reflecting the two published tables that I’ve linked to.  It makes no sense to keep these as two separate data sources, the data has been treated in exactly the same way, so there is the same column count and headers within each column.  Therefore I can just append the data sets;

I could have appended to an existing data set, but for simplicity (for me) I appended to a new.  I have therefore added three data sets to my visualisation, ‘Registered Ports’, ‘Well-Known Ports’ and the appended output ‘All Ports’

Note: If you have a custom application or have customised ports for off the shelf software and services, add an element to the query to find the ports it uses and include this information. A simple way of achieving this would be to add a conditional column to the data set, that uses the description information for all ports apart from those for the custom application, which you could add as free text.

Merge with our existing data set for local and foreign ports

I’ve got port information in my output and all ports data set, I can use this data to perform a left outer join of the data to my existing output query.  I’ll performing a left outer join as this will take all the selected data from my first selection and add matching data from the second.  in theory this will allow me to introduce a column that contains the in use port descriptions.  I’ll need to do this twice as I’m tracking both Local and Foreign active ports.

After performing the merge I’m left with what looks like a new column in my query, to select the data that I want to display from my merged data I need to click on the arrow heads and select the data I want;

I’m only interested in the description as this will add value to my final visualisation.  Having merged the all ports data against the foreign port information as well I have the following;


Edit the visualisation

I don’t need to do anything complicated with the visualisation, after all all I’ve done is merge in some supplemental information.  In the original visualisation we had 2 slicers to allow filtering on the numerical value of the local and foreign port, I’ve changed this from referencing the numerical port to the description we’ve pulled from wikipedia.

This now allows the filtering and interrogation of the data utilising the port descriptions, which makes the visualisation somewhat easier to navigate.  For example, I no longer need to know that RDP utilises TCP 3389 to investigate the flagged unknown connection.


The goal was to merge in TCP port descriptions to the visualisation, removing some of the specialist knowledge required to gain value from the data as it is presented.  In practice this was very simple to do and even extend further to include custom port and application/service information.  It might not look that pretty, but functionally I would suggest that the goal has been met.

Thanks for reading, as I mentioned before, if you enjoy what you’re reading and have some suggestions for how to extend further, drop me a DM on Twitter.