Networking Weirdo's

For friendly off topic discussion not covered in a forum above.
Forum rules
No politics, please.
User avatar
uunix
Donor
Donor
Posts: 1629
Joined: Sun Mar 27, 2011 12:48 pm
Location: Stourbridge / England / UK

Networking Weirdo's

Unread postby uunix » Fri Feb 24, 2017 1:52 pm

So over the years, like many of you, I've come across strange, intermittent network issues that have been hard to track down, but normally end up being something plugged in many years ago.

Tonight I had just one of those issues. For a few weeks now my network or rather my internet has been playing up. Wife complaining, outlook failing to connect to it's autodiscover, dns failing on the DC, everything worked fine internally, but internet was playing up, but not all the time, a glitch about every 12 hours or so.

Restarting DNS service on the windows server cured it sometimes, whereas rebooting the router cured it other times. I changed the forwarder on the DNS, that seemed ok for two days, then down again. I started thinking something internal must be flooding the DNS service.. until tonight when I lost outlook, and decided to look at the router again, but this time the router address showed me a different page..

A year or so back I had bought a network controllable power switch as I'd had a machine that was hanging every so often and required a power reboot. This was ideal and allowed me to restart the machine from work. I cured the hanging issue on the machine, turns out it didn't like its RDP port being redirected. Any - Hoo I'd forgot about that little box, and it turns it must have reset it's IP back to default a few weeks back which just so happen to be the same as my router.. hence I was surprised to be greeted by the PSU web interface instead of my router! Problem solved!

So, what's your story?
-----------------------------------------------------------------------
Hey Ho! Pip & Dandy!
:Fuel: :Octane2: :O2: :Indigo: :Indy:
-----------------------------------------------------------------------

devv
Posts: 124
Joined: Sat Jun 30, 2012 6:04 pm

Re: Networking Weirdo's

Unread postby devv » Fri Feb 24, 2017 2:08 pm

In my case, couple months ago the Internet link and the IPTV (services provided through the same cable/provider) started freezing up at random times. Sometimes 10-12 times in a row with 5 minutes in between, sometimes everything worked and I had an issue once in like 4-5 days.

In the beginning, it was TV that had issues more often than the Internet link. Then the other way around. Then a mix of both. In all cases, restarting the router provided by the ISP fixed it.

Recently, the latest/current behavior is that Internet works until I turn on the TV. When I turn on the TV, the IP address, network address and DHCP settings on the router change, and the Internet stops working for all computers.

If I manually change my computer's IP to match the different settings, sometimes Internet works through it, and sometimes it doesn't.

In any case, rebooting the router brings things back to their initial default state (old/known-good IP, network and DHCP settings.)

(Also it's interesting that when the problem happens, the IP/network/DHCP settings always change to the same wrong setting, so it doesn't seem like a random corruption in data.)

I've tolerated this for long enough that I'm gonna call support to come and fix this (probably replace the router?) next week.
oOoO :Tezro: oOoO

User avatar
Elf
Donor
Donor
Posts: 73
Joined: Wed Oct 19, 2016 9:54 pm
Location: Pacific Northwest (US)

Re: Networking Weirdo's

Unread postby Elf » Sat Feb 25, 2017 4:46 am

I was at a service provider that used Cisco 4948s as customer facing L3 row switches in a moldy last-gen network for a colocation facility. One day, across the board, they all degraded back to what Cisco calls "process switching," meaning forwarding all traffic using their CPU rather than the hardware ASICs. Cisco is notoriously cheap with their CPUs in switches and routers (which are generally only used for the control plane, rather than handling traffic) and I think most modern smartphones have more horsepower than the supervisor CPU in these things. So, of course it didn't go well. A reboot solved it, but it wasn't clear why dozens of them across two datacenters suffered from the same failure at the same time.

Looking into it, the routing tables were close to, but not quite overloaded (they had some minor portion of the Internet BGP table), but that alone didn't seem to be the issue. Much puzzlement later, netflow data showed that there were some pretty aggressive scans across the address space that they served, at exactly the time of the failure. Just Internet noise; people looking for stuff to compromise, although the volume and rate of the traffic was much larger than average. I hypothesized that somehow the scans contributed to the failure, and it turned out I was right.

Eventually I was able to consistently reproduce it in a lab. Trying to talk to an IP address the router doesn't have in its ARP table causes it to send an ARP asking what Ethernet MAC address associates with that IP. When a response is received, the IP to MAC address mapping is stored as a forwarding entry in the ASIC TCAM, just like an IP route. Cisco would call it a forwarding "adjacency." The ASICs, of course, have a limited amount of space for forwarding entries, usually stored in expensive TCAM memory. A high-rate scan across all of our address space precipitated the worst case of that, with all active IPs in all subnets now in the forwarding table, which, with the already heavy route table, overloaded the capacity of the ASICs.

A sane failure mode would be to stop trying to stuff new forwarding entries into TCAM and just leave the existing set in there, dropping any new adjacencies. This is what Juniper does. A not-so-great failure mode is just to abandon hardware forwarding entirely, and to never recover, even when the table shrinks back to the size that it could be contained in hardware. This is what Cisco does. Just another step along the road of becoming completely disenchanted with Cisco's poor device behavior and their increasingly lacking R&D. Eventually I found that someone else had found similar results, although only with regards to overloading the routing table: http://www.blackhole-networks.com/OSPF_overload/#test_1

Thinking of it, more than half of my "networking oddity" stories just boil down to bad decisions made by Cisco. Stories about unconfigurable "errdisable" functionality causing multiple Ethernet switches in a transport network to shut down their uplinks all at once. Stories about the awful MPLS VPN implementation in IOS 12 and 15, filling the LSP forwarding table with garbage and unable to properly handle IPv6 in an L3VPN... Terrible stuff.
:Indy: :Indy: :Indy: :Indigo2: :Indigo2IMP: :O2: :O2: :Octane: :Octane: :Fuel: :Tezro:
:Indy: [x19] :Indigo: [x7] :O2: [x4]

Shiunbird
Donor
Donor
Posts: 269
Joined: Fri May 06, 2016 1:43 pm
Location: Czech Republic

Re: Networking Weirdo's

Unread postby Shiunbird » Sun Feb 26, 2017 2:44 am

Two things that got resolved last week at work:

- Audio calls (VoIP G711 because our phones are old) having robotic sound. After two weeks trying to figure out, we found out that one of the switches rebooted with a previous static arp table, with the 1st and the 3rd hop being the same NIC in one of the switches. Since all the traffic is UDP, it was arriving at the destination at an chaotic order, making the audio sound crap. Took a long time to find this one.

- Packet loss in video calls happening randomly, making the thing unusable for a whole big office. Months of troubleshooting, escalating to Cisco and Palo Alto, no go. Then one network engineer that happens to work from that office returned from his 2-month long honeymoon and resolved the issue in 30 minutes: there was a QoS rule for a secondary internet link that used to be used as a fail-over backup thing, but nowadays it's used in parallel together with the main internet link. So calls that were flowing through that secondary link were affected by the QoS policy, and calls going through the primary one were not. Since the two links are now bound together in a virtual circuit, no one assumed legacy rules that were applied to the individual links would affect anything.

We live, we learn.


Return to “Everything Else”

Who is online

Users browsing this forum: No registered users and 1 guest