September 12, 2011

Xbox Live, NAT, and You

I've always been a network geek.  I've always heard of people with issues with NAT and Xbox Live and thought "They just must not know what they're doing" and dismissed it as an 1D10T error.

Until now.

Now that I'm in a job that requires me to understand Live and NAT, it seriously makes me wonder about the forethought put into the creation of the Live service.  They admittedly do some really cool stuff to get around some of the NAT problems, but I can't help but think that it's actually over-engineered.

What follows will be a HIGHLY technical article.  While I'm going to make every effort to make this as accessible as possible to Joe Everyman, it is without a doubt still very technical.  If you're not curious about the technical nature of Live, have no understanding of networking and don't care, or otherwise don't feel the need to know what's under the hood of your 360, you can click off now, as this article isn't for you.  It will, without a doubt, put you to sleep.

I'll give the uninterested people a few minutes to disperse.

Still with me?

Good.

Here comes more than you'll probably ever want to know about Xbox Live and NATs.

Let's begin with basics for the uninitiated few that did stick around.  All devices connected to the internet get an IP (Internet Protocol) address.  That address cannot be used by any other device.  Years ago, when precious few IP addresses were available (we're actually out of IPv4 addresses now!) smart people long ago established NAT (Network Address Translation) as a way to share a single address to multiple devices.  A NAT router is the piece of magic that allows all of your devices at home to harmoniously use the single IP address that nearly every ISP (Internet Service Provider) gives you.  Most people will refer to this as a Linksys or Cisco router, but many manufacturers (D-Link, Belkin, Netgear, to name a few) make these devices.  Some are better than others, but they all serve the same basic purpose.  To everyone but gamers, these devices for the most part magically work and no thought should ever really be given to them.  But in our case, we care greatly about their capabilities.

How Xbox Live works

There are five ports that Xbox Live uses:

TCP 80, 443, 3074
UDP 88, 3074

The critical port of the bunch though, is that UDP 3074 guy.  You see, not only do the servers operated by Microsoft care about him, but every other Xbox on the planet does as well.  Yes friends, Xbox Live is a peer-to-peer service as well as a client-server service.  This fact is until recently where I was blissfully ignorant.  I thought, "That's simple, just allow the ports through your firewall and you're done."  Not so fast slick.  Not only does the 360 phone home to Live and expect to carry on a conversation, but other 360s will want to talk to you as well.  This is the part where NAT becomes a pain in the neck for many people.

How NAT works

First off, there are three types of NAT:  Open (Full Cone), Moderate (Restricted Cone), and Strict (Symmetric).  For your Xbox to work at its best, you want to have an Open NAT.  Let's look at a little detail of what each of these types means in the real world.

To start with, all NAT types contain a state table.  This is the computer's way of knowing what traffic coming in from the outside world is supposed to go to what address in the private, translated world.  The table however, will be different depending on the NAT type.  The second thing that should be kept in mind is that by default in all NAT types, a client on the inside must initiate traffic before an entry is created in the state table.  Until that happens, no traffic from the outside can be forwarded to the inside.  In all the examples that follow, assume the following:

Your own internal IP address is 192.168.1.10
The IP you've received from your ISP is 10.1.1.10
The IP of the Xbox Live service is 172.16.1.36
The IP "Remote Live Player A" received from their ISP is 10.2.2.20
The internal IP address of "Remote Live Player A" is 192.168.2.20
The IP "Remote Live Player B" received from their ISP is 10.3.3.30
The internal IP address of "Remote Live Player B" is 192.168.3.30

Open NAT:

Open NAT is the simplest form of NAT.  It only cares about the internal client's information, and could care less about where the traffic from the outside world is coming from.  An Open NAT's state table might look something like this:

192.168.1.10:3074 <-> 10.1.1.10:5000

This is the device saying "Hey, anything that comes in from the outside world to port 5000, it should go to 192.168.1.10 on port 3074."

Moderate NAT:

Moderate NAT goes a step further in that it not only cares about the translation of your internal address to a public one, but it will only allow that translation to work specifically with a given remote port.  A Moderate NAT's state table might look something like this:

192.168.1.10:3074 <-> 10.1.1.10:5000 <-> [Any IP]:10000

This is the device saying "Hey, anything that comes in from the outside world going to port 5000 AND is from source port 10000, it should go to 192.168.1.10 on port 3074."

This introduces our first wrinkle in how Xbox Live works in cooperation with a NAT.  As you see in our examples, most NAT routers will take the port you're talking on, and use a completely random different port to talk to the outside world on.  Because of this, we can't know what port a conversation will come in from when another 360 tries to talk to us.  So when the following happens, we have a problem:

192.168.1.10:3074 <-> 10.1.1.10:5000 <-> 10.2.2.20:10000 <-> 192.168.2.20:3074 = YAY!
192.168.1.10:3074 X-X 10.1.1.10:5000 <-> 10.3.3.30:12000 <-> 192.168.3.30:3074 = FAIL!

Remote Live Player A is able to talk with no issue because the connection is established and the 360 is aware of Player A.  However, Remote Live Player B is a sad panda, because the NAT router is expecting all traffic to come in from port 10000 instead of 12000.

Strict NAT:

Strict NAT goes even further, and specifically restricts source port AND IP.  A Strict NAT state table might look something like this:

192.168.1.10:3074 <-> 10.1.1.10:5000 <-> 10.2.2.20:10000
192.168.1.10:3074 <-> 10.1.1.10:6000 <-> 10.3.3.30:12000

This is the device saying "Hey, anything that comes in from the outside world going to port 5000 AND is from address 10.2.2.20 AND is coming from port 10000, it should go to 192.168.1.10 on port 3074.  Also, anything that comes in from the outside world to port 6000 AND is from address 10.3.3.30 and is coming from port 12000, that also should go to 192.168.1.30 on port 3074"

Here is where it really gets fun.  Here the NAT router, despite the fact that the communication is being made from the same port by the local client, creates another public facing port for the communication.  This is vastly more secure on the part of the NAT router, and for everyday life, preferable.  However, it's the Xbox Live service breaking equivalent of dumping Thermite into a steel furnace.  It's broken and NOBODY gets to use it!

NAT Hole Punching:

However, Microsoft employs some pretty smart people, and they foresaw this.  Even with an Open NAT type, the remote 360 needs a way to know which port to communicate with your 360.  So when you sign into Live it records the port you're talking to it on.  In the case of an Open NAT, this is pretty much the end of the story, because when you are matched with another player, their 360 is told what port to talk to you on, and everything works:

You: Hey Live, I'm signing in!
192.168.1.10:3074 <-> 10.1.1.10:5000 <-> 172.16.1.36:3074
Live: Thanks User, I see you're on port 5000... I'll remember that for later.

Player A: Hey Live, I'm signing in!
192.168.2.20:3074 <-> 10.2.2.20:10000 <-> 172.16.1.36:3074
Live: Thanks Player A, I see you're on port 10000... I'll remember that for later.

You and Player A then get matched in a game...
Live: Hey User, you're going to be matched with Player A.  Talk to him on port 10000.

192.168.1.10:3074 <-> 10.1.1.10:5000 <-> 10.2.2.20:10000 <-> 192.168.2.20:3074

Gaming bliss is achieved.

In a Moderate NAT situation, we've got a problem.  Since our NAT routers have determined a source port at random ahead of time, AND our router cares about the port traffic arriving to it is coming from, AND we can't change the port of other devices talking to us, we've got to employ some trickery to make things work.  Let's look at the same conversation above and why it fails when the user's 360 attempts to talk to Remote Live Player A.  We've got the same conversation, but I'm going to show you the NAT state table that is created in addition to the network conversation:


You: Hey Live, I'm signing in!

User's network conversation:
192.168.1.10:3074 <-> 10.1.1.10:5000 <-> 172.16.1.36:3074
User's NAT state table:
192.168.1.10:3074 <-> 10.1.1.10:5000 <-> [Any IP]:3074

Live: Thanks User, I see you're on port 5000... I'll remember that for later.

Player A: Hey Live, I'm signing in!

Player A's network conversation:
192.168.2.20:3074 <-> 10.2.2.20:10000 <-> 172.16.1.36:3074
Player A's NAT state table:
192.168.2.20:3074 <-> 10.2.2.20:10000 <-> [Any IP]:3074

Live: Thanks Player A, I see you're on port 10000... I'll remember that for later.

You and Player A then get matched in a game...
Live: Hey User, you're going to be matched with Player A.  Talk to him on port 10000.

Failed network conversation between User and Player A:
192.168.1.10:3074 <-> 10.1.1.10:5000 <-> 10.2.2.20:10000 X-X 192.168.2.20:3074

Player A's NAT Router: Uh, some guy out there is trying to talk to me on port 10000 from port 5000... I don't have an entry for that.  GO AWAY!

Again though, we have some smart people at Microsoft, so for Moderate NAT users, we have this instead:

You and Player A then get matched in a game...
Live: Hey User, you're going to be matched with Player A.  Talk to him on port 10000.  By the way, he's not an Open NAT type so before we can start playing, you need to speak to him first before the game begins.
Live: Hey Player A, you're going to be matched with User.  Talk to him on port 5000.  By the way, he's not an Open NAT type so before we can start playing, you need to speak to him first before the game begins.


User sends a packet to Player A:
192.168.1.10:3074 -> 10.1.1.10:5000 -> 10.2.2.20:10000

User's new NAT state table:
192.168.1.10:3074 <-> 10.1.1.10:5000 <-> [Any IP]:3074
192.168.1.10:3074 <-> 10.1.1.10:5000 <-> [Any IP]:10000

Player A sends a packet to User:
192.168.2.20:3074 -> 10.2.2.20:10000 -> 10.1.1.10:5000
Player A's new NAT state table:
192.168.2.20:3074 <-> 10.2.2.20:10000 <-> [Any IP]:3074
192.168.2.20:3074 <-> 10.2.2.20:10000 <-> [Any IP]:5000

Network conversation between User and Player A:
192.168.1.10:3074 <-> 10.1.1.10:5000 <-> 10.2.2.20:10000 <-> 192.168.2.20:3074

Again, we have gaming bliss.

So far, we have ways around different NAT types.  The problem comes when we get to the Strict type.  Since every connection out from a Strict NAT results in a new public facing port being used, Xbox Live can't use the same tactic of hole punching since we have no way of knowing what port will be used as the public facing port.  The whole process breaks down then when trying to hole punch:

You: Hey Live, I'm signing in!

User's network conversation:
192.168.1.10:3074 <-> 10.1.1.10:5000 <-> 172.16.1.36:3074
User's NAT state table:
192.168.1.10:3074 <-> 10.1.1.10:5000 <-> 172.16.1.36:3074

Live: Thanks User, I see you're on port 5000... I'll remember that for later.

Player A: Hey Live, I'm signing in!

Player A's network conversation:
192.168.2.20:3074 <-> 10.2.2.20:10000 <-> 172.16.1.36:3074
Player A's NAT state table:
192.168.2.20:3074 <-> 10.2.2.20:10000 <-> 172.16.1.36:3074

Live: Thanks Player A, I see you're on port 10000... I'll remember that for later.

You and Player A then get matched in a game...
Live: Hey User, you're going to be matched with Player A.  Talk to him on port 10000.  By the way, he's not an Open NAT type so before we can start playing, you need to speak to him first before the game begins.
Live: Hey Player A, you're going to be matched with User.  Talk to him on port 5000.  By the way, he's not an Open NAT type so before we can start playing, you need to speak to him first before the game begins.


User sends a packet to Player A:
192.168.1.10:3074 -> 10.1.1.10:6000 -> 10.2.2.20:10000

User's new NAT state table:
192.168.1.10:3074 <-> 10.1.1.10:5000 <-> 172.16.1.36:3074
192.168.1.10:3074 <-> 10.1.1.10:6000 <-> 10.2.2.20:10000

Player A sends a packet to User:
192.168.2.20:3074 -> 10.2.2.20:12000 -> 10.1.1.10:5000

Player A's new NAT state table:
192.168.2.20:3074 <-> 10.2.2.20:10000 <-> 172.16.1.36:3074
192.168.2.20:3074 <-> 10.2.2.20:12000 <-> 10.1.1.10:5000

Failed Network conversation between User and Player A:
192.168.1.10:3074 -> 10.1.1.10:6000 -X 10.2.2.20:10000
10.1.1.10:5000 X- 10.2.2.20:12000 <-> 192.168.2.20:3074

Player A's NAT RouterUh, some guy out there is trying to talk to me on port 10000 from 10.1.1.10 port 6000... I don't have an entry for that.  GO AWAY!
User's NAT RouterUh, some guy out there is trying to talk to me on port 5000 from 10.2.2.20 port 12000... I don't have an entry for that.  GO AWAY! 

In Ghostbusters speak, we have a crossing of the streams.  We made our connections out, but because our source port changed, we aren't listening on the ports that the conversation is occurring on.  As a result, those with Strict NAT types are pretty well up the creek when being matched.  They have to rely on being matched with ONLY Open NAT types.  Only an Open NAT will work with a Strict, because they don't care where the conversation is coming from, just as long as it's on the right port.  Since Live can inform the Strict user of this, the Strict user initiates the connection to the Open user, and there is no problem.  But since the local port randomization occurs, even a Moderate user cannot talk with a Strict user since they will be unable to know what port to talk to the Strict user on.

Oh no, that's terrible, how do you fix it?

Well, there's a couple ways to fix this for a home user.  Newer NAT routers, the ones above that I mentioned are "better than others" contain a technology called Universal Plug and Play, or UPnP for short.  UPnP, without giving you the highly technical explanation above, realizes what's happened here, and ensures that the traffic reaches the proper destination without any user intervention.  This is by far the easiest method to fix this.  It will also enable you to have more than one 360 using Live at a time in your household.  It is also security stupidity as well.  Allowing a device to create firewall rules simply by virtue of being behind the firewall is pretty foolish, and this is why nearly no enterprise firewall supports this behavior.

However, it can be fixed on some other NAT routers as well.  They key is that UDP 3074 guy once again.  Since this is the ONLY port that some random 360 will talk to you on, you can use this information to your advantage.  One way of fixing this on a non-UPnP enabled router is to add the 360 to the DMZ, if the router has one.  This will cause most NAT routers to forward any traffic not destined for a valid NAT state table entry to the DMZ.  However, this isn't always the case.  If your router behaves somewhat differently than some, and uses the local source port as the public facing source port, your easy solution here is to redirect UDP 3074 to the 360.  This has the unfortunate side effect that only one 360 will be working AT ALL in your household, should you happen to have more than one.

That seems reasonable, what's your problem then?

Well, my problem, as a network admin is that beyond being an administrative nightmare to assign a reserved DHCP address and set up port redirecting from an external address to that reserved internal address for every 360 on my network, I simply don't have that many public IP addresses available.  If I did, I wouldn't be using NAT, and I wouldn't be making this post.  Now granted, my need to support Xbox Live as a service on my network is one that isn't terribly pervasive, but it is certainly one that Microsoft should care about.  I'm sure that colleges, some small ISPs, even some home users with several 360s likely run into this problem quite frequently.  With my specific network setup, I can make a single 360 absolutely sing without much effort.  The problem is getting *all* of them in harmony.  I can't fathom why when knowing that the NAT problem exists, and knowing that they require all users to fork over their $60 (only suckers pay full price for Live, by the way) each year, that all traffic isn't handled by either the Live servers, or a required dedicated server owned by the developer of the game.  Granted that IPv6 should fix this problem, but we are some time off before it is widely implemented, and currently I'm unaware of any gaming device at all that even supports it.

So for now, we're stuck with NAT.  But now you know what's going on if your 360 is complaining about a Strict or Moderate NAT type, and what you potentially can do about it!

1 comment: