Dev Diary: The Hunt for the One Sync Bug!

Sync Bugs Part 2

Dev Diary: The Hunt for the One Sync Bug!

 

Tired but Wired!

 

Well I have to say I really, really detest sync bugs - probably more than all of you combined. But as of this moment I am elated despite being utterly exhausted (it’s 4:04am as I type this sentence) because I’ve found and tracked a sync bug that has been in Sins for a VERY long time. One of the great joys of making games is tracking down the nastiest bugs, the kind that hound you for days, months and even years, the kind that constantly show up in your support email, the kind that always end up in the top ten posts of the forums. Sync bugs are the worst.

 

I’m going to explain the cause of the sync bug, how it was tracked, and how it was fixed. It might be easier to understand if you take at look at my last dev diary where I spoke at length about sync bugs. You can find it here: https://forums.sinsofasolarempire.com/331090.

 

Cause and Effect

 

The sync bug was caused by the second type listed in the previous dev diary. Type 2 is mixing non-deterministic code with deterministic code. Sins uses two random generators, one is the deterministic generator and the other is the non-deterministic generator (DRandom and NDRandom as we call them in code). DRandom is always called by stuff that affects the simulation so that the same results are achieved on everyone’s computers without having to transmit data (it’s a property of the math). NDRandom is called by stuff that doesn’t affect the simulation like the random direction particles shoot off in an explosion or the flickering of the exhaust. You can’t mix them but unfortunately we did!

 

A Barrage of Problems

 

In Sins, when certain classes of abilities are activated, the effect you see comes from hidden “buff points” on the mesh. For example when the ultimate ability of the Marza (“Missile Barrage”) fires you see lots of missiles shoot out of the right-side missile racks which are defined as a little grid of “buff points”. When each individual missile goes to launch, one of those invisible points is randomly selected as the missile’s origin. Because this is an effect, it’s supposed to be using the NDRandomStream to make that point selection and it is. However, the randomly selected point is then used in the math that calculates the time it will take the missile to damage the target. That time then becomes part of the ability’s simulation. Based on this, different computers are going to generate different results for the time it will take the missile to hit the target and they may go out of sync. Of course, Missile Barrage isn’t the only culprit. Any ability that is built using ApplyBuffToTargetWithTravel has a chance to go out of sync. This would include things like EMP blast, Gauss Blast and many more. However, it turns out for the majority of them, the probability is actually zero. The chance of it going out of sync is linearly proportional to the number of buff points. Most of these abilities only have one buff point so it doesn’t matter which random generator we use - the same result will always occur as there is only one point to pick! The Marza’s Missile Barrage is more likely because it’s made up of a bunch of points.

 

Delta Force

 

Detecting a de-sync is conceptually as simple as comparing the complete simulation state of the game on one person’s computer against that of another’s. This is pretty expensive (i.e., slow your computer to a crawl) so we add up all the state values into a “checksum” and then just compare that. Every update, each person sends his or her checksum to the host and the host compares them. If they differ, they are out of sync. Once you know they are out of sync, you need to determine the delta, or the difference, in the states to see where they diverged. I put a post up asking people to enable “snapshots” and to turn the detail up to 3. What this did is write out a file that contains a very detailed snapshot of that person’s game state at the time the de-sync was detected. Once all players send me the files I can throw them into a special program that shows me the delta between them (I use Beyond Compare which I highly recommend for a wide variety of game development related tasks). The delta tells me what diverged and I can then start looking at cause. The problem is getting the de-sync to happen in the first place!

 

The Hunt for the Ring

 

One of my favourite board games is War of the Ring which is based on Lord of the Rings. In this game the evil guys get to put aside special Hunt Dice that increases their chances of tracking down the ring-bearer and his hidden fellowship. The more dice you put in there, the better your chances. Tracking sync bugs is like tracking Frodo and Sam wearing those elven cloaks and having Gollum lead them through all the secret hidden paths but we’ve only got half a dice which I suppose symbolizes half of one of the nine Nazgul’s horses. We are basically screwed, the ring gets thrown into the lava time and again, and we receive nasty emails that would do Sauron proud. In order to stop this vicious cycle we’ve called upon the masses of the internet to become our Hunt Dice and track the ring-bearer down! Luckily, a few special Nazgul found Frodo and sent us their reports so we could kill him and take back the ring.

 

Special thanks to the following nine Dark Riders:

 

Cvon

Mvon

Argus_I

Ark

arkkiller

Igilima

Skywarp

Gu1do

Telamon

 

(Yes, by pure coincidence there were exactly 9 people able to recreate it and send in valid snapshot files.)

 

Out Damn Spot!

 

Not much to say here. I simply made sure the buff point was selected using the Deterministic Random Generator instead of the Non-Deterministic one. Expect both original Sins and Sins: Entrenchment to be patched up with this fix shortly!

 

Thanks for reading,

Blair

92,293 views 17 replies
Reply #1 Top

As a network engineer myself, I know what it feels like to track down that needle in haystack, or worse, in a stack of needles depending up what the problem is and I certainly don't evny you code writers trying to debug code to find that one bug that causes so much grief.  But when you find and replace that one bad cable, or that one jabbering NIC, and harmony returns to your environment, just makes you sit back and smile. 

Well done Blair, looking forward to the patch.

Gu1do

PS:  I knew that 18 months working as a QA Analyst (my 1st job in the IT field) would pay off somewhere down the line! :)

 

     

Reply #2 Top

I praise you! :D You can have half of one of my children! I will finally be able to play my first full match with someone without it going out of sync now hopefully, once the patch comes out.

Reply #3 Top

After all this time, I'd think the first reaction would be to beat up the offending peice of code...

Way to go.

 

:fox:

Reply #4 Top

Wicked Cool!!!!.... What a strange bug, but even more strange, I understood the explanation.  :)

Sooooooo glad to hear this and so glad that all those desynced games were ultimatley useful in helping determine where the problem existed.  I can not wait to play end game stategy,,,,,or play a MultiPlayer map with more than 20 planets.. or.... move in a different direction than my ally to expand my empire, Muhuhahahaha so many choices, so little time.  Can't wait, can't wait!

Hold a sec... I think I need a moment...<single tear of joy lazily rolls down my cheek>....<chanting mantra - I am a man - I am a man!>

Seriously THANK YOU and WELL DONE hehe, that darn ring was mighty elusive! 

woo hoo and congrats on finding the prob

Cvon and Mvon 

 

Reply #5 Top

Because its come up a few times: the Marza's Missile Barrage isn't the only cause of the sync bug - it's just an example of one of them. You can still go out of sync even if no one has built a Marza (e.g. using the Vasari's Phase Missile Swarm).

Also, there may very well be other sync bugs that haven't appeared yet. We won't know for sure until we get this patch out.

+1 Loading…
Reply #6 Top

Great read as always, Blair. :P

Reply #7 Top

Also, there may very well be other sync bugs that haven't appeared yet. We won't know for sure until we get this patch out.

 

Why does this scare me?

Reply #8 Top

Passes 2 fine joints and a fifth of greygoose.

Reply #9 Top

blair , would not a simpler solution be a jump in the entry point for ndrandom to drandom

harpo

Reply #10 Top

Good job Blair.  We're glad we could help.

4AM Saturday morning and still working!  Now that's dedicated.  I think my friends and I were saving up our ongoing game of "Civ 4: Fall from Heaven 2" around 4AM Saturday morning my time, lol.

I'm a software engineer so I completely understand the entire ordeal as well as the final relief.  We had a bug resolved this last week that I spent almost 2 weeks trying to figure out.  I work in C# and use interop with a 3rd party C++ component.  Apparently the bug has been there over 2 years but rarely if ever happened on XP.  It wasn't until I had to make everything Vista-compatible that it turned into a show-stopper.  It kept crashing  with a heap corruption (and no useable breakpoint) on one call to the unmanaged code but only on Vista or only if I enabled unamanaged code debugging on XP.  Turns out that you shouldn't treat a C++ pointer to a structure as a ref structure on the C# side of a function call.

Here's hoping there are no more sync bugs but if there are then we'll do our best to reproduce them for you, lol.

Reply #11 Top

Although I never had sync bugs well done there and thanks for posting!

It is just amazing to see so much passion and dedication these days so here is another Karma for you sir :p

 

Reply #12 Top

I must say, I'm not sure if this is the only bug.  My friend and I were playing on Hyperion's (spelling?) in a 2vs2vs2vs2 of us vs easy computers.  It appeared everything was going fine...until a huge army came to attack me and he only saw one guy.  Coincidently, before that we finished up a 2 vs 2 with easy opponents on dopplegangers.  That one went without a hitch from start to finish.  I'm going off of idiot knowlege here, but is it possible that autosaves (and saves for that matter) could desynch too?  What would happen if during an autosave, they save different states and when you fire it up again, the new game is desynched from get go.  Just my 2 cents.

Reply #13 Top

Nominally, it could be easier to just have one computer call up the autosave for both systems. (or it might not)

Reply #14 Top

Hi, is this fix included in the recent 1.16 Patch? I couldn't find it mentioned in the change log ;)

Reply #15 Top

I'm not sure if it has been fixed.  There is another thread (https://forums.sinsofasolarempire.com/331090/get;2127279) that I've been posting on but there isn't a fix yet over there either.  I'm hoping to find the allusive 1.16 Sync Logger so my friends and I (who are all having this issue) can help try and tack this thing down.

Plus, I want my named with a "Special Thank You" from Blair and kill that stupid Frodo.  I mean seriously, who doesn't wear shoes?!?!  Stupid shoeless code.   :P

Reply #16 Top

Me and my friend just played our first online game yesterday, we 2 vs 2 comps on Doppelgangers. He has the german version and I have the english one, maybe that could be causing problems?

Anyways, some time in the game we desynced. I was curious when I told him to attack some forces in my solar system and he asked where they are.

I then saw that they were actually attack his system, a whole fleet at his sun. I told him and he told me he does only see one scout there.

I was really like wtf but we got on, as we didn't know this bug yet.

Later he told me to come help as he was attacked by the other comp. I came and couldn't find any enemies. I then razed a colony of the enemy in his system (a dead asteroid my friend ignored) as he said that I'm currently fighting the enemy fleet and my shields were down. I looked at the peacefull picture in front of me and my full shields...

This was when we realized something happened. I looked for similar things alot and finally found this and the other diary entry about it.

Still, we have sync errors in the latest versions. We could aid in killing this bug, if we can reproduce it. And I bet we can as we are those friends that always seem to have problems playing together online... (mysterious things)

Reply #17 Top

I can say this issue has been seen in both of the online games I played with a friend of mine. We recently picked up the game and are having a blast. Its good fun to set up games against a whole bunch of AI which can last for hours.. 

Our previous game has lasted about 8 hours and the desync happened once or twice there. It was easily fixed there by loading a autosave game. Now we are in a much  bigger game and after about 6 hours of play, desync again and its hard to decide what savegame to load to make sure it didnt happen already. But it seems impossible to play a big game vs AI without this problem happening. 

 

Btw, running Entrenchment, latest version