Bang Bang Sounds Like Machinery

Saturday, 6 April 2013

Traceroute through Cisco PIX / ASA

I recently had to clear and redeploy a PIX firewall to a new location, and realised that I had forgotten some of the subtleties involved in getting management and troubleshooting tools to work properly. So this is more of a note to self....

Windows tracert is fairly straight forward and uses pure ICMP with incrementing TTL values. Linux traceroute with the -I switch works the same way.

The firewall is required to allow the following:
Outbound
- Echo Request

Inbound
- Echo Reply
- Time-Exceeded (needed for TTL=0 responses)

Cisco and Linux traceroute by default uses incrementing UDP ports (from 33434) and incrementing TTL values.

The firewall is required to allow the following:
Outbound
- UDP ports 33434 - 33464

Inbound
- Time-Exceeded (needed for TTL=0 responses)
- Destination Unreachable (needed for the final hop port-not-found response)

Putting it all together we get a rule set that looks something like this:


object-group icmp-type ICMP-returns
 description Legit ICMP responses
 icmp-object echo-reply
 icmp-object time-exceeded
 icmp-object unreachable

object-group service Cisco_Traceroute_udp udp
 port-object range 33434 33464

access-list outside_access_in extended permit icmp any object-group External_nets object-group ICMP-returns log disable

access-list inside_access_in remark Permit outbound pings
access-list inside_access_in extended permit icmp object-group Internal_nets any echo log disable

access-list inside_access_in remark Permit traceroute from Cisco devices
access-list inside_access_in extended permit udp object-group Internal_nets any object-group Cisco_Traceroute_udp log disable

Obviously, this assumes sensible values for Internal_nets (e.g. 192.168.0.0/16) and External_nets (i.e. public IP ranges assigned to your external interface)

As an addendum, the Firewall is not (strictly speaking) a router, and therefore in many cases will not decrement the TTL. I have found this unnecessary in most cases, but if needed, can be enabled as follows:


policy-map global_policy
 class class-default
  set connection decrement-ttl

Friday, 12 October 2012

Putty Class in VBScript

We have a fairly busy network, comprising several hundred Cisco devices across some fifty sites, and putty is one of my mainstay tools for updating configs and general troubleshooting.

So when I started looking around for something quick and easy to carry out batched updates, I looked at Putty first. Using Putty for scripted tasks wasn't as easy as I thought it would be, the main problem being access to screen feedback so that I can verify that my commands have had the expected effect.

One solution is to turn on logging and use that as a proxy screen. Here's a VBScript class which includes some basic send and "receive" functionality. Error handling is stripped to a bare minimum to keep the size of the script down here, but hopefully it gives a flavour of what is possible.

Option Explicit
'===========================================================================
'Name:    Putty class
'Author:  Philip Damian-Grint
'Version: 1.0
'Date:    12th Oct 2012
'
'Description:
'
'  A starter VB class used to drive Putty sessions typically for Cisco
'  devices, allowing sending of commands, and returning screen output
'  to allow the possibility of conditional processing.
'
'  Putty has a number of logging options; for Cisco vty sessions, only 
'  printable output is required for line-based output processing, but 
'  full session output at least is required where escape sequences
'  need to be captured for screen positioning. (Not demonstrated here)
'===========================================================================

' Constants

Const EXELOC            = """c:\Program Files\Linux Utilities\PuTTY\putty.exe"""
Const LOG_PRINT         = "1"
Const LOG_SESSION       = "2"
Const MODE_LINE         = 0
Const MODE_CHAR         = 1
Const REGPUTTY          = "HKCU\Software\SimonTatham\PuTTY\Sessions\Default%20Settings\"
Const REGLGFILE         = "HKCU\Software\SimonTatham\PuTTY\Sessions\Default%20Settings\LogFileName"
Const REGLGTYPE         = "HKCU\Software\SimonTatham\PuTTY\Sessions\Default%20Settings\LogType"
Const STATUS_SUCCESS    = 0
Const STATUS_FAILURE    = -1

Class Putty

  'CLASS PRIVATE VARIABLES
  Private p_iLastTideMark
  Private p_iMode
  Private p_iStatus
  Private p_iWait
  Private p_oFSO
  Private p_oSession
  Private p_oWShell
  Private p_sEnable
  Private p_sHost
  Private p_sLogName
  Private p_sLogType
  Private p_sPasswd
  Private p_sTempDir
  Private p_sUser

  'CLASS CREATOR & DESTRUCTOR

  Private Sub Class_Initialize()
    Set p_oWShell = WScript.CreateObject( "WScript.Shell" )
    Set p_oFSO = WScript.CreateObject( "Scripting.FileSystemObject" ) 
    p_sLogType = LOG_PRINT ' default to printable output
    p_iLastTideMark = 0 ' initial tide mark
    p_iWait = 5         ' default to 5 seconds wait after each command
    p_iMode = MODE_LINE ' default to reading lines
  End Sub

  Private Sub Class_Terminate()
    ResetLog()                       ' Clear our registry settings
    p_oFSO.DeleteFile( p_sLogName )  ' Get rid of the temporary file
    Set p_oWShell = Nothing
    Set p_oFSO = Nothing
    Set p_oSession = Nothing
  End Sub

  'CLASS PROPERTIES
 
 'enable() is WO
  Public Property Let enable( sEnable ) : p_sEnable = sEnable : End Property

 'host() is RW
  Public Property Let host( sHost ) : p_sHost = sHost : End Property
  Public Property Get host() : host = p_sHost : End Property

 'logtype() is RW
  Public Property Let logtype( sLogType ) : p_sLogType = sLogType : End Property
  Public Property Get logtype() : logtype = p_sLogType : End Property

 'mode() is RW
  Public Property Let mode( iMode ) : p_iMode = iMode : End Property
  Public Property Get mode() : mode = p_iMode : End Property

 'passwd() is WO
  Public Property Let passwd( sPasswd ) : p_sPasswd = sPasswd : End Property

 'status() is RO
  Public Property Get status() : status = p_iStatus : End Property

 'user() is RW
  Public Property Let user( sUser ) : p_sUser = sUser : End Property
  Public Property Get user() : user = p_sUser : End Property

 'wait() is RW
  Public Property Let wait( iWait ) : p_iWait = iWait : End Property
  Public Property Get wait() : user = p_iWait : End Property

  'CLASS PRIVATE FUNCTIONS
  
  Private Function EnableLog ' Switch on Putty logging
    EnableLog = -1
    p_sLogName = p_oWShell.ExpandEnvironmentStrings( "%Temp%" ) & _
                "\" & p_oFSO.GetTempName()        
    If IsEmpty( p_oWShell.RegWrite( REGLGFILE, p_sLogName,"REG_SZ" ) ) AND _
       IsEmpty( p_oWShell.RegWrite( REGLGTYPE, p_sLogType, "REG_DWORD" ) ) Then
            EnableLog = 0
    End If
  End Function

  Private Function Quit( sReason ) ' Display message and Exit
    WScript.Echo sReason : WScript.Quit
  End Function

  Private Function ResetLog ' Switch off Putty logging
    p_oWShell.RegDelete( REGPUTTY )
  End Function

  Private Function ReadLog ' Read latest output from Putty log
    Dim oFile : Set oFile = p_oFSO.OpenTextFile( p_sLogName )
    Dim iCount : iCount = 0
    Dim aLogLines(), sLogChars

    Do Until oFile.AtEndOfStream    ' Find our old tide mark
        If iCount < p_iLastTideMark Then
            oFile.SkipLine
        Else
            Redim Preserve aLogLines( iCount - p_iLastTideMark ) 
            aLogLines( iCount - p_iLastTideMark ) = oFile.ReadLine
        End If
       iCount = iCount + 1
    Loop
    p_iLastTideMark = iCount        ' New tidemark
    ReadLog = aLogLines             ' Return everything since the last tidemark
    oFile.Close
    Set oFile = Nothing
  End Function

  Private Function SendInput( sInput )  ' find Putty's active window and send keystrokes to it
    WScript.Sleep 3000               ' Or greater if debugging to give time for window switching
    Do 
        WScript.Sleep 100
    Loop until p_oWShell.AppActivate( p_oSession.ProcessID )    ' Find our session window
    p_oWShell.SendKeys( sInput & "{ENTER}" )        ' Do the deed
  End Function

  'CLASS METHODS

  Public Function Connect  ' Launch Putty 
    p_iStatus = STATUS_FAILURE          ' assume failure
    If (NOT IsEmpty( p_sUser ) AND _
            NOT IsEmpty( p_sPasswd ) AND _
            NOT IsEmpty( p_sUser ) AND _
            NOT IsEmpty( p_sHost ) ) Then
        If EnableLog <> 0 Then Quit( "Aborting - Can't update registry" )
        On Error Resume Next            ' graceful error handling
        Set p_oSession = p_oWShell.exec( EXELOC & " " & p_sHost & " -l " & _
                        p_sUser & " -pw " & p_sPasswd )
        WScript.Sleep 2000              ' Allow some time to settle down
        If ( ( p_oSession Is Nothing ) OR ( p_oSession.Status <> 0 ) ) Then Exit Function
        On Error Goto 0
        p_iStatus = STATUS_SUCCESS
        Connect = ReadLog()             ' Pass the initial screen back
    End If
  End Function

  Public Function Send( sChars ) ' Send a command and read the output after waiting iWait seconds
    SendInput( sChars )
    WScript.Sleep p_iWait * 1000
    Send = ReadLog()
  End Function

End Class

And to demonstrate the class in use, we take the code above and store it in a file called "classes.vbi", and then pull that file in using an "Include" function to our puttytest.vbs below.

All this demo does is log onto a cisco device, send a command and logout, relaying any putty screen output to our screen:

Tested with Putty version 0.60 under WIndows XP SP3:

Option Explicit
'===========================================================================
'Name:    puttytest.vbs
'
'Description:
'
'   Wrapper to test our putty class
'   Run from command line:
'        cscript  puttytest.vbs
'===========================================================================
 
'Utility Functions

Function Include ( sFileVBI ) ' include an external vbs/vbi file
    Dim oFSO : Set oFSO = WScript.CreateObject( "Scripting.FileSystemObject" )
    Dim oFile : Set oFile = oFSO.OpenTextFile( sFileVBI )
    ExecuteGlobal oFile.ReadAll()
    oFile.Close : Set oFile = Nothing
    Set oFSO = Nothing
End Function

Function GetUserInfo( sPrompt ) ' prompt for input
    WScript.StdOut.Write( sPrompt )
    GetUserInfo = WScript.StdIn.ReadLine
End Function

Function GetPassword( sPrompt ) ' prompt for hidden input
    Dim oPasswd : Set oPasswd = WScript.CreateObject( "ScriptPW.Password" )
    WScript.StdOut.Write( sPrompt )
    GetPassword = oPasswd.GetPassword()
    Set oPasswd = Nothing
End Function

Function WriteLines( aOut ) ' print array of strings
    Dim sLine : For Each sLine in aOut
        WScript.StdOut.Write( sLine & VbCrLf )
    Next
End Function

'========================
' Test our Putty Class
'========================

Include "classes.vbi"

Dim aOutPut
Dim sLineOut
Dim sTextToSend

Dim oSession : Set oSession = New Putty     ' Create a new instance of our class

oSession.host = GetUserInfo( "Please type hostname: " )  ' Get some basic info
oSession.user = GetUserInfo( "Please type username: " )
oSession.passwd = GetPassword( "Please type password: " )

aOutPut = oSession.Connect                  ' and launch our putty session

If oSession.Status = STATUS_SUCCESS Then

    WriteLines( aOutPut )
    oSession.wait = 3                       ' we can set a timer for each command
    aOutPut = oSession.Send( "show ver" )   ' show version IOS command
    WriteLines( aOutPut )
    aOutPut = oSession.Send( " " )          ' usually runs to 2 screens
    WriteLines( aOutPut )
    aOutPut = oSession.Send( "logout" )     ' close session
    WriteLines( aOutPut )

Else

    WScript.Echo "Failed to launch Putty"
    
End If

Saturday, 4 August 2012

Two-way NAT / PAT on a VPN (Cisco) Stick

Some time ago I was tasked with interfacing to a couple of other multi-site organisations across a large governmental network similar in operation to the Internet. This was an interim measure prior to integrating aspects of the three networks into a single entity, and prior to having any dedicated WAN links in place.

I had to provide connectivity between a variable number of users and servers across all three networks, and with many overlapping IP ranges in place. The idea was to have a flexible enough configuration that I could easily add and change routes at the far end to keep pace with any integration work.

The last requirement was the support of one or more AD trusts between the organisations, with DNS forwarding.

To make it as light a touch a possible for the far-end IT departments, I went for a single interface Cisco router that could be connected directly to a Firewall DMZ on the far end firewall.

The topology essentially looked like this (with extraneous devices stripped away) and Internet substituted for the (private) government network:

At a more useful level, showing physical interfaces and IP addresses:

NAT

Ideally, in order to allow variable numbers of users to cross the NAT boundary in either direction, one would be able to use PAT in both directions. However, this is only available to the “inside” interface.
As a large number of users were likely to be crossing from the remote site, and only a few from the hub site, I had to make the remote physical interface act as “inside” and the tunnel act as “outside”.
This allowed me to use PAT for remote users and dynamic NAT for hub users.
Servers were easily handled by static NAT in both directions

Non-NAT-Compliant Applications

Nowadays, most applications, including to my surprise, Microsoft domain trusts, work quite well across (Cisco IOS) NAT boundaries. I found only one application which didn’t: an old version of HP Openview ServiceDesk, which embeds the source IP of the HPOV server inside the java client for use in a subsequent return connection.
In this particular instance, the server was based at the hub site, and no IP conflict existed at the remote site. I was able to create an identity NAT for the server in the direction of the Hub which worked fine once supporting routes were in place.

MTU issues

Because the remote firewall has not participated in creating the tunnel endpoint, it can’t respond correctly to hub-destined traffic with DF flag set, so we have to ensure that the remote firewall allows ICMP unreachables to be sent from our router to devices on its internal network.

Design Notes

Some basic notes might be required to clarify where all the addresses are coming from:

IPSEC and GRE tunnel end-points

The physical interfaces representing the Hub and Remote endpoints have internal 192.168 addresses and are mapped to NAT addresses on their upstream firewalls. I have used 10.1.1.10 and 10.2.2.10 respectively.

Inter-Org NAT Allocation

Subnet 192.168.198.0/24 is used to Dynamically NAT all Hub users accessing Remote servers
Subnet 192.168.199.0/24 is used to present Hub servers to Remote users.
IP Address 192.168.200.1 is used to PAT all Remote users accessing Hub servers
Subnet 192.168.200.0/24 is used to present Remote servers to Hub users.

Configuration Fragments

The configurations below have been taken from working devices, with some minimal IP address obfuscation:

Hub Distribution Router:

! IKE Phase 1
crypto isakmp policy 1
 encr aes
 authentication pre-share
 group 5

! Pre-shared key for remote site
crypto isakmp key RemoteSiteKey address 10.2.2.10

! IKE Phase 2
crypto ipsec transform-set AES256_SHA_tra esp-aes 256 esp-sha-hmac 
 mode transport

! Crypto ACL for GRE to remote site
ip access-list extended HUB-INTERNET-REMOTE-CRYACL
 remark Tunneled traffic over the Internet to Remote site
 permit gre host 192.168.12.9 host 10.2.2.10

! Crypto MAP entry for remote site
crypto map INTERNET-CM 15 ipsec-isakmp 
 set peer 10.2.2.10
 set transform-set AES256_SHA_tra 
 match address HUB-INTERNET-REMOTE-CRYACL

! Physical interface for termination of all WAN and Internet tunnels
interface GigabitEthernet0/1
 description Connects to Local Firewall inside
 ip address 192.168.12.9 255.255.255.248
 crypto map INTERNET-CM

! Tunnel to remote site
interface Tunnel14200
 description Tunnel over Internet to Remote Site
 ! low bandwidth used (EIGRP) for backup tunnels over Internet
 bandwidth 1000
 ip address 192.168.14.201 255.255.255.252
 ! Maximum starting MTU (1500 - 8(NAT-T) - 53(AES256) - 24(GRE))
 ip mtu 1415
 ! high delay used (EIGRP) for backup tunnels over Internet
 delay 2000
 tunnel source GigabitEthernet0/1
 tunnel destination 10.2.2.10
 ! Tell GRE to copy DF from inner to outer IP header
 tunnel path-mtu-discovery

ip prefix-list EIGRP-SITETUNNELS-OUT-PL description Route adverts to remote sites
ip prefix-list EIGRP-SITETUNNELS-OUT-PL seq 5 permit 0.0.0.0/0
ip prefix-list EIGRP-SITETUNNELS-OUT-PL seq 10 permit 192.168.0.0/16 le 32

router eigrp 192
 passive-interface GigabitEthernet0/1
 network 192.168.12.0
 network 192.168.14.0
 distribute-list prefix EIGRP-SITETUNNELS-OUT-PL out Tunnel14200
 no auto-summary
 no eigrp log-neighbor-changes

Hub Firewall:

PIX Version 7.2(2)

name 10.1.1.10 dist-rt02_INTERNET
name 192.168.12.9 dist-rt02_G01

object-group service NAT-T udp
 description NAT Traversal
 port-object eq 4500

object-group service IPsec_udp udp
 description UDP protocols used by IPsec
 group-object NAT-T
 port-object eq isakmp

object-group network Cisco_Devices
 description Cisco devices' Internet interfaces
 network-object host remote-rt01_INTERNET

interface Ethernet0
 speed 100
 duplex full
 nameif outside
 security-level 0
 ip address 10.1.1.4 255.255.255.0 standby 10.1.1.5 

interface Ethernet1
 speed 100
 duplex full
 nameif inside
 security-level 100
 ip address 192.168.12.12 255.255.255.248 standby 192.168.12.13

route outside 0.0.0.0 0.0.0.0 10.1.1.1 1
route inside 192.168.0.0 255.255.0.0 192.168.12.9 1

! Mapping the routable Tunnel endpoint
static (inside,outside) dist-rt02_INTERNET dist-rt02_G01 netmask 255.255.255.255 

access-list inside-access-in remark Allow ISAKMP & NAT-T to sites using VPN-over-Internet
access-list inside-access-in extended permit udp host dist-rt02_G01 object-group Cisco_Devices object-group IPsec_udp log disable 

access-group inside-access-in in interface inside

Remote Firewall:

PIX Version 6.3(4)

interface ethernet0 100full
interface ethernet1 100full
interface ethernet4 100full

nameif ethernet0 outside security0
nameif ethernet1 inside security100
nameif ethernet4 HUBDMZ security49

ip address outside 10.2.2.4 255.255.255.0
ip address inside 192.168.0.5 255.255.254.0
ip address HUBDMZ 192.168.22.9 255.255.255.248

failover ip address outside 10.2.2.5
failover ip address inside 192.168.0.6
failover ip address HUBDMZ 192.168.22.10

object-group network HUB
  description HUBDMZ network
  network-object 192.168.22.8 255.255.255.248 
  description HUB users on this subnet
  network-object 192.168.198.0 255.255.255.0 
  description HUB servers on this subnet
  network-object 192.168.199.0 255.255.255.0 

! Minimal ACLs to permit traffic flow – not representative!
access-list inside_access_in permit ip any object-group HUB 
access-group inside_access_in in interface inside

access-list hubdmz_access_in permit icmp any any
access-list hubdmz_access_in permit ip host 192.168.22.13 host 10.1.1.10 
access-list hubdmz_access_in permit ip object-group HUB 192.168.0.0 255.255.0.0 
access-group hubdmz_access_in in interface HUBDMZ

access-list outside_access_in permit udp host 10.1.1.10 host 10.2.2.13 eq 4500 
access-list outside_access_in permit udp host 10.1.1.10 host 10.2.2.13 eq isakmp 
access-group outside_access_in in interface outside

! Bypass NAT for incoming HUB traffic (low security to high security)
access-list NO_NAT_HUBDMZ permit ip object-group HUB 192.168.0.0 255.255.0.0 
nat (HUBDMZ) 0 access-list NO_NAT_HUBDMZ

! Mapping the routable Tunnel endpoint
static (HUBDMZ,outside) 10.2.2.10 192.168.22.13 netmask 255.255.255.255 0 0 

! Hub users and servers respectively
route HUBDMZ 192.168.198.0 255.255.255.0 192.168.22.13 1
route HUBDMZ 192.168.199.0 255.255.255.0 192.168.22.13 1

Remote VPN Router:


! example hub hosts with pre(real) and post nat addresses (hub perspective)
ip host hubhost01 192.168.4.10 192.168.199.5
ip host hubhost02 192.168.4.20 192.168.199.6
! example remote hosts with "pre" and "post"(real) nat (hub perspective)
ip host remotehost01 192.168.200.7 192.168.4.50
ip host remotehost02 192.168.200.8 192.168.4.51

! need inspection to activate ALGs
ip inspect name INSPECT_LIST dns
ip inspect name INSPECT_LIST ftp
ip inspect name INSPECT_LIST https
ip inspect name INSPECT_LIST icmp
ip inspect name INSPECT_LIST imap
ip inspect name INSPECT_LIST pop3
ip inspect name INSPECT_LIST esmtp
ip inspect name INSPECT_LIST sqlnet
ip inspect name INSPECT_LIST streamworks
ip inspect name INSPECT_LIST tftp
ip inspect name INSPECT_LIST tcp
ip inspect name INSPECT_LIST udp
ip inspect name INSPECT_LIST vdolive
ip inspect name INSPECT_LIST kerberos
ip inspect name INSPECT_LIST ldap
ip inspect name INSPECT_LIST microsoft-ds

! IKE Phase 1
crypto isakmp policy 1
 encr aes
 authentication pre-share
 group 5

! Pre-shared key for this site
crypto isakmp key RemoteSiteKey address 10.1.1.10

! IKE Phase 2
crypto ipsec transform-set AES256_SHA_tra esp-aes 256 esp-sha-hmac 
 mode transport

! Crypto ACL for GRE to hub site
ip access-list extended REMOTE-INTERNET-HUB-CRYACL
 remark Traffic tunnelled over Internet to HUB
 permit gre host 192.168.22.13 host 10.1.1.10

! Crypto MAP entry for hub site
crypto map INTERNET-CM 2 ipsec-isakmp 
 set peer 10.1.1.10
 set transform-set AES256_SHA_tra 
 match address REMOTE-INTERNET-HUB-CRYACL

! Single physical interface for LAN and VPN traffic
! in/out ACL not included in config
interface FastEthernet0/0
 description Exit to Internet and Remote LAN via Remote DMZ
 ip address 192.168.22.13 255.255.255.248
 no ip redirects
 ip inspect INSPECT_LIST in
 ! Treat the remote network as inside so we can use PAT
 ip nat inside
 ! enabled automatically with NAT config
 ip virtual-reassembly
 duplex full
 speed 100
 no cdp enable
 crypto map INTERNET-CM

interface Loopback0
 description Remote PAT address for overlapping client subnets
 ip address 192.168.200.1 255.255.255.0

interface Tunnel14200
 description Tunnel over Internet to Hub network
 ! low bandwidth used (EIGRP) for backup tunnels over Internet
 bandwidth 1000
 ip address 192.168.14.202 255.255.255.252
 ! Maximum starting MTU (1500-8(NAT-T)-53(AES256)-24(GRE))
 ip mtu 1415
 ! Required to allow PAT in the opposite direction
 ip nat outside
 ! enabled automatically with NAT config
 ip virtual-reassembly
 ! high delay used (EIGRP) for backup tunnels over Internet
 delay 2000
 tunnel source FastEthernet0/0
 tunnel destination 10.1.1.10
 tunnel path-mtu-discovery

router eigrp 192
 passive-interface Loopback0
 network 192.168.14.0
 network 192.168.200.0
 distribute-list prefix EIGRP-TUNNEL-OUT-PL out Tunnel14200
 no auto-summary
 eigrp stub connected

! Floating default route back to the hub over the tunnel
ip route 0.0.0.0 0.0.0.0 192.168.14.201 200

! Example remote site networks - 192.168.4.0 chosen to demonstrate overlaps
ip route 192.168.0.0 255.255.254.0 192.168.22.9
ip route 192.168.4.0 255.255.254.0 192.168.22.9
ip route 192.168.35.0 255.255.254.0 192.168.22.9

! Explicit route for our tunnel destination to avoid recursion
ip route 10.1.1.0 255.255.255.0 192.168.22.9

! We need the flexibility of PAT to be applied to the remote network
ip nat inside source list REMOTE-USERS interface Loopback0 overload

! Which leaves us on the "outside" using dynamic NAT
ip nat pool HUB-POOL 192.168.198.1 192.168.198.254 prefix-length 24
ip nat outside source list HUB-USERS pool HUB-POOL

! Example remote servers - DNS ALG will use these to translate our queries
ip nat inside source static 192.168.4.50 192.168.200.7
ip nat inside source static 192.168.4.51 192.168.200.8

! Example hub servers - DNS ALG will use these to translate their queries
ip nat outside source static 192.168.4.10 192.168.199.5
ip nat outside source static 192.168.4.20 192.168.199.6

! Define which remote subnets hide behind PAT
ip access-list standard REMOTE-USERS
 remark Remote main site
 permit 192.168.0.0 0.0.1.255
 remark Remote secondary site example
 permit 192.168.35.0 0.0.1.255

! Define which hub subnets hide behind Dynamic NAT
ip access-list standard HUB-USERS
 remark Hub IT department
 permit 192.168.32.0 0.0.0.255
 remark Hub main site
 permit 192.168.125.0 0.0.0.255
 remark Hub secondary site example
 permit 192.168.35.0 0.0.0.255

ip prefix-list EIGRP-TUNNEL-OUT-PL description Routes to be advertised from site
ip prefix-list EIGRP-TUNNEL-OUT-PL seq 5 permit 192.168.200.0/24

Postscript

The creation and ongoing support of Microsoft domain trusts across this two-way NAT boundary was reasonably straight forward. There were a couple of issues, neither of which were caused by or really impinged upon the configuration itself, but might be worth mentioning:

1. Problems creating a domain Trust across two-way NAT
I found it useful to ensure that all DNS servers in both domains could see and forward to each other. In one of the organisations I needed to connect to, this was tiresome because they had at least 6 DCs of which 4 were DNS servers. This requires static NAT entries to be configured for each server.
I also found that physical DCs were more reliable than VMs, in part due to VMware tools not being installed thoughtfully - the Shared Folders option should not be installed as it causes network (RPC) problems. However, you can't chose in advance which DCs will participate on each side, so it becomes useful to be able to mask off the suspect ones by removing their NAT entries.

2. Kerberos-related fragmentation
Depending upon the server and workstation versions, Kerberos may still default to UDP, which may cause performance problems due to fragmentation. This is particularly noticable where W2K3 and XP are in use, and where there are many nested groups and SID histories to bloat the packets. This manifests itself as a delay in accessing resources across the trust. Debugging ip virtual-reassembly may show maximum fragments or fragmentation buffer being exceeded and some additional tweaking may be required to prevent timeouts and retransmissions within Kerberos.

Sources

I found the following document very useful in getting this to production:
NAT Order of Operation

Saturday, 4 February 2012

MRTG Log Aggregator

Occasionally, I have needed to provide percentiles on a combined set of interfaces.
This requires a way of adding together samples from a number of log files, even though the sample timestamps might differ from file to file by a few minutes.

Here then is my current hack for doing this. The merged data set is implemented here as a doubly-linked list using nested hashes, not because I make use of these here, but because I lifted it from one of my other log manipulation tools. I will probably return to clean it up as time goes on.

#!/usr/bin/env perl
#
# NAME:         aggregate.pl
#
# AUTHOR:       Philip Damian-Grint
#
# DESCRIPTION:  Synthesize a new MRTG log file from 2 or more other log files.
#
#               This utility expects and generates version 2 MRTG log files,
#               (See http://oss.oetiker.ch/mrtg/doc/mrtg-logfile.en.html), based on a 
#
#               default sampling time of 5 minutes
#               In general there are 600 samples each of 5mins, 30mins, 120mins 
#               and 86400mins. Each dataset is a quintuple:
#               {epoch, in_average, out_average, in_maximum, out_maximum}
#
#               The file with the newest timestamp is used as a template for generating
#               the output file, processed backwards in time.
#
#               Samples from the second and further logfiles are combined with the template
#               according to the following rules:
#
#               1.  Samples from the input logfile which fall between two samples in the
#                   template, are combined into the sample with the higher timestamp
#
#               2.  Samples are combined using basic addition only
#
#               Each of the input files are checked for time synchronisation. If the
#               starting times of any of the second and subsequent input files are more 
#               than 5 minutes adrift from the first input file, the utility aborts.
#
# INPUTS:       Options, Logfile1, Logfile2, ...
#               aggregate.pl [--verbose] Logfile1 [, Logfile2, ...]
#
# OUTPUTS:      Logfile in MRTG format version 2
#               This is written to STDOUT
#
# NOTES:        1.   It should go without saying that running this against live log files while
#                    MRTG is running will have unpredictable results - copy the logfiles to
#                    a location where they will not be disturbed while being processed.
#
#               2.  It is possible that due to occasional variations at sample period
#                   boundaries (e.g. 5mins / 30 mins) and between files, some "samples" in the
#                   merged file might combine one or two samples more than expected.
#                   It would be possible to avoid this by say, adding a further field to each hash
#                   record to count and possibly restrict the samples combined from subsequent files.
#
# HISTORY:      3/2/2012: v1.0 created
#               8/2/2012: v1.1 header detection corrected
#

# PRAGMAS
use strict;

# GLOBALS
local $| = 1;                               # Autoflush STDOUT

# MODULES
use Getopt::Long;

# VARIABLES

# Parameters
my $verbose;

# Working Storage
my @fields;                                 # Holds fields from last record read
my $file_no;                                # Tracks current file being processed
my $inbytes_master;                         # Inbytes counter from the first file
my @keys;                                   # Holds sorted keys for merged dataset
my $outbytes_master;                        # Outbytes counter from the first file
my $prev_time;                              # Remember our previous timestamp
my $record_no;                              # Tracks last record read from current file
my $time_master;                            # First timestamp from first file
my $run_state;                              # Tracks processing phase (first file, subsequent file...)
my %samples;                                # Doubly-linked list representing merged file

# Subroutines
sub record_count {
    print "\r".++$record_no." of ".$file_no;
}

# INITIALISATION

GetOptions ("verbose" => \$verbose );       # Check for verbosity
$prev_time = 0;                             # Reset previous timestamp copy
$run_state = 'INIT';                        # Reset state
$time_master = 0;                           # Reset starting epoch

# MAIN BODY

# Process All Logfiles
while (<>) {
    chomp();                                # Remove carriage return etc
    @fields = ();                           # Clear our temporary holding area
    @fields = (split);                      # Split up our tuple

    # Start of File Processing    
    if (scalar(@fields) == 3) {             # Check for start of file
        print "\nStart of input file, datestamp: ".(scalar localtime(@fields[0]))."\n" if ($verbose);
        $record_no = 0;                     # Reset record counter

        # First file
        if ($run_state eq 'INIT') {         # If this is our first file
            $time_master = @fields[0];      # Capture the header timestamp
            $inbytes_master = @fields[1];   # Capture the header inbytes
            $outbytes_master = @fields[2];  # Capture the header outbytes
            $run_state = 'FIRST';           # And update our state
            $file_no = 1;                   # Start counting input files

        # Subsequent files
        } else {
            # At the end of the first file (only)
            if ($run_state eq 'FIRST') {
                @keys = reverse sort { $a <=> $b } (keys %samples); # Sort our keys
                $run_state = 'SUBSQ';                               # Note that first file has ended
            }
            # And in all cases
            $file_no++;                     # Count input files
            $inbytes_master += @fields[1];  # Add header inbytes to master
            $outbytes_master += @fields[2]; # Add header outbytes to master
            
            # Other files must be within 5 minutes of the first
            die("Header timestamp difference > 5 minutes found in file ".$file_no."\n") if (abs($time_master - @fields[0]) > 300);
        }
        &record_count if ($verbose);        # Update our on-screen counter
        $prev_time = @fields[0];            # Take a copy of this timestamp
        next;                               # Now start processing non-header records
    }

    # Check for "all-files" data mangling
    die("\nIncreasing timestamp found in record ".$record_no." of file ".$file_no."\n") if (@fields[0] > $prev_time);
        
    # First file just populates our template
    if ($run_state eq 'FIRST') {

        # Check for "first-file" data mangling
        die("\nDuplicate timestamp found in record ".$record_no." of file ".$file_no."\n") if (exists ($samples{@fields[0]}));

        # Create a hash entry indexed by datestamp
        $samples{@fields[0]}= {PREV => ($prev_time == @fields[0]) ? undef : $prev_time, NEXT => undef, TUPLE => [@fields[1], @fields[2], @fields[3], @fields[4]]};

        # If not the first item in the list, update the last item's NEXT pointer
        $samples{$prev_time}{NEXT} = @fields[0] if ($record_no > 1);

    # Subsequent files must be merged
    } else {
        foreach (@keys) {
            if ($_ <= @fields[0]) {
                $samples{$_}{TUPLE}[0] += @fields[1];
                $samples{$_}{TUPLE}[1] += @fields[2];
                $samples{$_}{TUPLE}[2] += @fields[3];
                $samples{$_}{TUPLE}[3] += @fields[4];
                last;
            } 
        }
    }
    $prev_time = @fields[0];                # Take a copy of this timestamp
    &record_count if ($verbose);
}

# Were we only given one file? @keys only populated on detection of a second file
die("\nError - only one input file supplied\n") unless (@keys);

# Output Merged File

# First our updated header record
print "$time_master $inbytes_master $outbytes_master\n";

# And then our records in reverse order
foreach (@keys) {
    print "$_ $samples{$_}{TUPLE}[0] $samples{$_}{TUPLE}[1] $samples{$_}{TUPLE}[2] $samples{$_}{TUPLE}[3]\n";
}

Thursday, 19 January 2012

IPSEC: Tunnel vs Transport Mode

If you go looking for it, there is whole stack of IPSEC documentation out there. It's mostly fairly dense, and tends to concentrate on explaining the somewhat complex operation and configuration details rather than exploring design choices.

One typical scenario is that you find yourself tasked with managing a multisite topology with redundant paths, and over third-party provider networks (private and public). The result is a requirement to implement encryption for all intersite traffic, for which the usual, and often only contender is IPSEC.

Those implementing IPSEC for the first time find that there are a large number of choices to be made, and all of them may seem to be equally important. As a result, the final implementation often bears a strong resemblance to one of the examples which can be found on the Cisco site (with all the subtleties hidden and important decisions pre-made).

One key decision involves the choice of operating mode: Tunnel or Transport.

Typically you find the differences between the two described in a number of ways such as:

Tunnel mode is used between gateways while Transport mode is used between endstations.
Tunnel mode is used for pass-through traffic, while Transport mode is used for end-to-end traffic
Tunnel mode encrypts the whole packet and provides a new header, while Transport mode only encrypts the data (payload).

These descriptions have hints and clues inside them but they don't really tell you why and when you should use them. But once you understand what the basic choice means, IPSEC suddenly becomes a lot more friendly.

Here's the question I think you should be asking:
"Which mode will best support my routing model?"

So, do you use dynamic routing or static routing? This is important because some of the same reasoning you use to justify your routing choice will be the same reasoning you use in making the Tunnel vs Transport choice.

Let's look at the static routing approach:
"I have a simple network and by using static routes I have complete control over what traffic is sent across my links."

So each static route is created at the 'source' of a link, directing traffic to the other end. This is much the same as a typical IPSEC Tunnel-mode link which uses ACLs to define "interesting traffic" at the 'source', to be sent to the other end.

But in order to get your traffic to traverse that IPSEC link, you must have a static route, and you must have a corresponding (crypto) ACL present. Without both in place, the forwarding won't happen. They must be matched by a mirror route/ACL at the other end. So that's four manual entries for each definable flow which must be updated if subnets or paths change.

What about the dynamic routing approach?:
"I have a complex network with multiple paths which I want to be discovered and utilised as needed by the network"

So someone taking this approach doesn't really want to be clumsily routing by ACL, but dynamically with a routing protocol. Trouble is, IPSEC Tunnel mode only handles unicast traffic, which would leave you with BGP as the only usable routing protocol.

You don't really want your routing configuration to have any dependencies on your IPSEC configuration at all. This is where Transport mode comes in:

Create an IPSEC Transport mode link between your pair of site routers and use it to carry only GRE traffic to create a GRE tunnel with its own /30 subnet and addressing, independent of the IPSEC link addressing.

Now, multicast routing protocols such as OSPF and EIGRP will run over the link and take care of all the other traffic.

The only ACL you need to define for IPSEC is one that identifies the peer router for GRE traffic, which won't change even if routing paths and subnet locations do. So that's four manual entries to take care of any number of definable flows, and which don't need to change unless one of the two site routers actually changes its address.

And that's all there is to it. (at a high level).

So in summary:

Tunnel mode IPSEC forces you to implement "Routing by Crypto-Map", which is ugly and unscalable, but appropriate for links between your external firewall and some other organisation, for instance.
Transport mode IPSEC (+GRE) frees up the routing design and makes it independent of encryption implementation; it is therefore ideal for any internal links, WAN or LAN.

This is in some ways, counter intuitive: Use Transport mode to carry tunnels and use Tunnel mode to transport raw packets.

PS: Don't forget if you use Transport mode IPSEC with GRE, that there are now two layers of encapsulation and you will need to take extra care with fragmentation and MTU issues. At a minimum you should have path MTU discovery enabled, ICMP unreachables NOT blocked and DF bits copied from the original IP header to the GRE header.

Saturday, 8 October 2011

Anatomy of a Netgear WNCE2001 Wireless Transceiver

Recently I bought a wireless transceiver from Netgear. It’s quite tidy at 8cm x 6cm x 1.5cm. I needed it to connnect a bunch of ethernet only devices on a switch to a WPA2 wireless network.

It comes with two connectors: a standard RJ-45 Ethernet port and a power jack which can connect to USB or a standard power socket.

In order to configure this device (with SSID and PSK), the power is connected to the USB on your PC and the ethernet is connected to your ethernet port. After the power and LAN lights go solid, start up your browser and you will automatically be taken to a page to configure the WLAN. Once configured, it operates as a bridge between ethernet and wireless segments.

The level of sophisticaton required to perform the configuration means that this little box must at least have an embedded DHCP server, web server and DNS server on it. Even then it still has to subvert my request for a home page out on the internet and give me back a configuration wizard page.

So I started up Wireshark and watched what happened.

This first section shows the startup through settling down in readiness for configuration:

No.     Time                       Source                Destination           Protocol Info
      1 2011-10-06 16:53:59.741855 CompalIn_fb:67:4c     Nearest               EAPOL    Start
        *** The PC is a potential supplicant requesting authentication in an 802.1X port access control environment (ignored hereafter)

      2 2011-10-06 16:54:02.519609 0.0.0.0               255.255.255.255       DHCP     DHCP Request  - Transaction ID 0x3df3ee15
        ***DHCP requests repeated (and omitted from here) a few times during INIT-REBOOT state

     10 2011-10-06 16:54:25.718628 0.0.0.0               255.255.255.255       DHCP     DHCP Request  - Transaction ID 0x2ca33bbe
        ***Here is the first DHCP request from the PC (requesting verification of its last used address) that receives a response

        User Datagram Protocol, Src Port: 68 (68), Dst Port: 67 (67)
        Bootstrap Protocol
            Message type: Boot Request (1)
            Transaction ID: 0x2ca33bbe
            Client IP address: 0.0.0.0 (0.0.0.0)
            Your (client) IP address: 0.0.0.0 (0.0.0.0)
            Next server IP address: 0.0.0.0 (0.0.0.0)
            Relay agent IP address: 0.0.0.0 (0.0.0.0)
            Client MAC address: CompalIn_fb:67:4c (00:1b:38:fb:67:4c)
            Option: (t=53,l=1) DHCP Message Type = DHCP Request
            Option: (t=61,l=7) Client identifier
            Option: (t=50,l=4) Requested IP Address = 192.168.1.73
            Option: (t=12,l=6) Host Name = "PCNAME"
            Option: (t=81,l=31) Client Fully Qualified Domain Name
            Option: (t=60,l=8) Vendor class identifier = "MSFT 5.0"
            Option: (t=55,l=11) Parameter Request List
            Option: (t=43,l=3) Vendor-Specific Information

     12 2011-10-06 16:54:25.757167 192.168.1.251         255.255.255.255       DHCP     DHCP NAK      - Transaction ID 0x2ca33bbe
        *** The Netgear DHCP server tells the PC not to use the address in its last DHCP request
        *** Interesting choice of address by Netgear - keeping away from first or last in case of overlap with the outside WLAN DHCP server?

        Bootstrap Protocol
            Message type: Boot Reply (2)
            Transaction ID: 0x2ca33bbe
            Client IP address: 0.0.0.0 (0.0.0.0)
            Your (client) IP address: 0.0.0.0 (0.0.0.0)
            Next server IP address: 0.0.0.0 (0.0.0.0)
            Relay agent IP address: 0.0.0.0 (0.0.0.0)
            Client MAC address: CompalIn_fb:67:4c (00:1b:38:fb:67:4c)
            Option: (t=53,l=1) DHCP Message Type = DHCP NAK
            Option: (t=54,l=4) DHCP Server Identifier = 192.168.1.251

     13 2011-10-06 16:54:26.838914 0.0.0.0               255.255.255.255       DHCP     DHCP Discover - Transaction ID 0x5391d124
        *** The PC starts from scratch and asks for a new allocation

        Bootstrap Protocol
            Message type: Boot Request (1)
            Transaction ID: 0x5391d124
            Client IP address: 0.0.0.0 (0.0.0.0)
            Your (client) IP address: 0.0.0.0 (0.0.0.0)
            Next server IP address: 0.0.0.0 (0.0.0.0)
            Relay agent IP address: 0.0.0.0 (0.0.0.0)
            Client MAC address: CompalIn_fb:67:4c (00:1b:38:fb:67:4c)
            Option: (t=53,l=1) DHCP Message Type = DHCP Discover
            Option: (t=116,l=1) DHCP Auto-Configuration = AutoConfigure
            Option: (t=61,l=7) Client identifier
            Option: (t=12,l=6) Host Name = "PCNAME"
            Option: (t=60,l=8) Vendor class identifier = "MSFT 5.0"
            Option: (t=55,l=11) Parameter Request List
            Option: (t=43,l=2) Vendor-Specific Information

     14 2011-10-06 16:54:26.862836 Netgear_77:cf:fe      Broadcast             ARP      Who has 192.168.1.100?  Tell 192.168.1.251
        *** Netgear checks that no one is using the address it wants to use for itself

     15 2011-10-06 16:54:28.929641 192.168.1.251         192.168.1.100         DHCP     DHCP Offer    - Transaction ID 0x5391d124
        *** Netgear gives the PC a new address, and tells it to use the Netgear address for gateway and DNS queries

        User Datagram Protocol, Src Port: 67 (67), Dst Port: 68 (68)
        Bootstrap Protocol
            Message type: Boot Reply (2)
            Transaction ID: 0x5391d124
            Client IP address: 0.0.0.0 (0.0.0.0)
            Your (client) IP address: 192.168.1.100 (192.168.1.100)
            Next server IP address: 0.0.0.0 (0.0.0.0)
            Relay agent IP address: 0.0.0.0 (0.0.0.0)
            Client MAC address: CompalIn_fb:67:4c (00:1b:38:fb:67:4c)
            Option: (t=53,l=1) DHCP Message Type = DHCP Offer
            Option: (t=54,l=4) DHCP Server Identifier = 192.168.1.251
            Option: (t=51,l=4) IP Address Lease Time = 1 minute
            Option: (t=1,l=4) Subnet Mask = 255.255.255.0
            Option: (t=3,l=4) Router = 192.168.1.251
            Option: (t=6,l=4) Domain Name Server = 192.168.1.251

     16 2011-10-06 16:54:28.930805 0.0.0.0               255.255.255.255       DHCP     DHCP Request  - Transaction ID 0x5391d124
        *** PC selects its best (only) offer and asks for confirmation

        Bootstrap Protocol
            Message type: Boot Request (1)
            Transaction ID: 0x5391d124
            Client IP address: 0.0.0.0 (0.0.0.0)
            Your (client) IP address: 0.0.0.0 (0.0.0.0)
            Next server IP address: 0.0.0.0 (0.0.0.0)
            Relay agent IP address: 0.0.0.0 (0.0.0.0)
            Client MAC address: CompalIn_fb:67:4c (00:1b:38:fb:67:4c)
            Option: (t=53,l=1) DHCP Message Type = DHCP Request
            Option: (t=61,l=7) Client identifier
            Option: (t=50,l=4) Requested IP Address = 192.168.1.100
            Option: (t=54,l=4) DHCP Server Identifier = 192.168.1.251
            Option: (t=12,l=6) Host Name = "PCNAME"
            Option: (t=81,l=31) Client Fully Qualified Domain Name
            Option: (t=60,l=8) Vendor class identifier = "MSFT 5.0"
            Option: (t=55,l=11) Parameter Request List
            Option: (t=43,l=3) Vendor-Specific Information

     17 2011-10-06 16:54:28.973606 192.168.1.251         192.168.1.100         DHCP     DHCP ACK      - Transaction ID 0x5391d124
        *** And Netgear confirms. Note that the lease time is only 1 minute

        Bootstrap Protocol
            Message type: Boot Reply (2)
            Transaction ID: 0x5391d124
            Client IP address: 0.0.0.0 (0.0.0.0)
            Your (client) IP address: 192.168.1.100 (192.168.1.100)
            Next server IP address: 0.0.0.0 (0.0.0.0)
            Relay agent IP address: 0.0.0.0 (0.0.0.0)
            Client MAC address: CompalIn_fb:67:4c (00:1b:38:fb:67:4c)
            Option: (t=53,l=1) DHCP Message Type = DHCP ACK
            Option: (t=54,l=4) DHCP Server Identifier = 192.168.1.251
            Option: (t=51,l=4) IP Address Lease Time = 1 minute
            Option: (t=1,l=4) Subnet Mask = 255.255.255.0
            Option: (t=3,l=4) Router = 192.168.1.251
            Option: (t=6,l=4) Domain Name Server = 192.168.1.251


     18 2011-10-06 16:54:28.999218 CompalIn_fb:67:4c     Broadcast             ARP      Gratuitous ARP for 192.168.1.100 (Request)
        ** PC sends out a few of these to make sure other devices update their arp tables

     22 2011-10-06 16:54:31.933507 CompalIn_fb:67:4c     Broadcast             ARP      Who has 192.168.1.251?  Tell 192.168.1.100
     23 2011-10-06 16:54:31.933929 Netgear_77:cf:fe      CompalIn_fb:67:4c     ARP      192.168.1.251 is at c4:3d:c7:77:cf:fe
        *** Then updates its own

     *** This is a work PC from an AD and Novell environment. Interesting that Netgear choses to answer these queries
     *** The strategy seems to be to resolve all names to itself
     28 2011-10-06 16:54:32.002898 192.168.1.100         192.168.1.251         DNS      Standard query SRV _ldap._tcp.WORKCampus._sites.dc._msdcs.subdom.workdomain.co.uk
     29 2011-10-06 16:54:32.003300 192.168.1.251         192.168.1.100         DNS      Standard query response SRV
     30 2011-10-06 16:54:32.005632 192.168.1.100         192.168.1.251         DNS      Standard query SOA PCNAME.subdom.workdomain.co.uk
     31 2011-10-06 16:54:32.006002 192.168.1.251         192.168.1.100         DNS      Standard query response SOA[Malformed Packet]
     33 2011-10-06 16:54:32.046312 192.168.1.100         192.168.1.251         DNS      Standard query A time1.workdomain.co.uk
     34 2011-10-06 16:54:32.046677 192.168.1.251         192.168.1.100         DNS      Standard query response A 192.168.1.251

     35 2011-10-06 16:54:32.054459 192.168.1.100         192.168.1.251         NTP      NTP symmetric active
     36 2011-10-06 16:54:32.055120 192.168.1.251         192.168.1.100         ICMP     Destination unreachable (Port unreachable)
        *** Of course, having inadvertently told the PC that it is a time server, it can't actually handle the ntp request

     *** We get a similar story with SLP:
     37 2011-10-06 16:54:32.237637 192.168.1.100         192.168.1.251         DNS      Standard query A slp1.workdomain.co.uk
     38 2011-10-06 16:54:32.237933 192.168.1.251         192.168.1.100         DNS      Standard query response A 192.168.1.251
     39 2011-10-06 16:54:32.238284 192.168.1.100         192.168.1.251         SRVLOC   Service Request, V2 XID - 3755
     40 2011-10-06 16:54:32.238387 192.168.1.100         192.168.1.251         DNS      Standard query A slp2.workdomain.co.uk
     41 2011-10-06 16:54:32.238623 192.168.1.251         192.168.1.100         ICMP     Destination unreachable (Port unreachable)
     42 2011-10-06 16:54:32.238640 192.168.1.251         192.168.1.100         DNS      Standard query response A 192.168.1.251
     43 2011-10-06 16:54:32.238909 192.168.1.100         192.168.1.251         SRVLOC   Service Request, V1 Transaction ID - 3756
     44 2011-10-06 16:54:32.239248 192.168.1.251         192.168.1.100         ICMP     Destination unreachable (Port unreachable)

     *** Must be time to refresh the arp cache:
     75 2011-10-06 16:54:37.053065 Netgear_77:cf:fe      CompalIn_fb:67:4c     ARP      Who has 192.168.1.100?  Tell 192.168.1.251
     76 2011-10-06 16:54:37.053085 CompalIn_fb:67:4c     Netgear_77:cf:fe      ARP      192.168.1.100 is at 00:1b:38:fb:67:4c
     89 2011-10-06 16:54:40.749850 192.168.1.100         192.168.1.251         ICMP     Echo (ping) request  (id=0x0300, seq(be/le)=16896/66, ttl=1)
     90 2011-10-06 16:54:40.750236 192.168.1.251         192.168.1.100         ICMP     Echo (ping) reply    (id=0x0300, seq(be/le)=16896/66, ttl=64)

     *** Now the Netgear appears to be taking on Netbios master browser functions for the segment - the first exchange of a few is shown only
     92 2011-10-06 16:54:41.009843 192.168.1.100         192.168.1.255         NBNS     Registration NB PCNAME<00>
        NetBIOS Name Service
            Transaction ID: 0x8906
            Flags: 0x2910 (Registration)
                0... .... .... .... = Response: Message is a query
                .010 1... .... .... = Opcode: Registration (5)
                .... ..0. .... .... = Truncated: Message is not truncated
                .... ...1 .... .... = Recursion desired: Do query recursively
                .... .... ...1 .... = Broadcast: Broadcast packet
            Questions: 1
            Answer RRs: 0
            Authority RRs: 0
            Additional RRs: 1
            Queries
                PCNAME<00>: type NB, class IN
                    Name: PCNAME<00> (Workstation/Redirector)
                    Type: NB
                    Class: IN
            Additional records
                PCNAME<00>: type NB, class IN
                    Name: PCNAME<00> (Workstation/Redirector)
                    Type: NB
                    Class: IN
                    Time to live: 3 days, 11 hours, 20 minutes
                    Data length: 6
                    Flags: 0x6000 (H-node, unique)
                        0... .... .... .... = Unique name
                        .11. .... .... .... = H-node
                    Addr: 192.168.1.100

     93 2011-10-06 16:54:41.011045 192.168.1.251         192.168.1.100         NBNS     Name query response NB 192.168.1.251
        NetBIOS Name Service
            Transaction ID: 0x8906
            Flags: 0x8400 (Name query response, No error)
                1... .... .... .... = Response: Message is a response
                .000 0... .... .... = Opcode: Name query (0)
                .... .1.. .... .... = Authoritative: Server is an authority for domain
                .... ..0. .... .... = Truncated: Message is not truncated
                .... ...0 .... .... = Recursion desired: Don't do query recursively
                .... .... 0... .... = Recursion available: Server can't do recursive queries
                .... .... ...0 .... = Broadcast: Not a broadcast packet
                .... .... .... 0000 = Reply code: No error (0)
            Questions: 0
            Answer RRs: 1
            Authority RRs: 0
            Additional RRs: 1
            Answers
                PCNAME<00>: type NB, class IN
                    Name: PCNAME<00> (Workstation/Redirector)
                    Type: NB
                    Class: IN
                    Time to live: 3117 days, 12 hours, 16 minutes
                    Data length: 1536
                    Flags: 0x4000 (M-node, unique)
                        0... .... .... .... = Unique name
                        .10. .... .... .... = M-node
                    Addr: 192.168.1.251

    *** Netgear's strategy of resolving all names to itself means it has to fend off Novell traffic as well:
    102 2011-10-06 16:54:43.389929 192.168.1.100         192.168.1.251         DNS      Standard query A NOVELL-NDS-TREE.subdom.workdomain.co.uk
    103 2011-10-06 16:54:43.390306 192.168.1.251         192.168.1.100         DNS      Standard query response A 192.168.1.251
    104 2011-10-06 16:54:43.390479 192.168.1.100         192.168.1.251         TCP      3438 > 524 [SYN] Seq=0 Win=65535 Len=0 MSS=1260 SACK_PERM=1
    105 2011-10-06 16:54:43.390809 192.168.1.251         192.168.1.100         TCP      524 > 3438 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0

The middle section starts when we launch our browser through to the start of the wizard

    *** Now we start up firefox - interesting to note that firefox appears to look up the home page address before it checks for proxy settings
    164 2011-10-06 16:55:00.309853 192.168.1.100         192.168.1.251         DNS      Standard query A en-gb.start3.mozilla.com
    165 2011-10-06 16:55:00.310404 192.168.1.251         192.168.1.100         DNS      Standard query response A 192.168.1.251
        *** Predictably, Netgear resolves the address to itself

    167 2011-10-06 16:55:00.669057 192.168.1.100         192.168.1.251         DNS      Standard query A wpad.subdom.workdomain.co.uk
    168 2011-10-06 16:55:00.669391 192.168.1.251         192.168.1.100         DNS      Standard query response A 192.168.1.251
        *** Standard behaviour when automatically detect proxy is set - go looking for WPAD

    171 2011-10-06 16:55:00.680058 192.168.1.100         192.168.1.251         TCP      3444 > 80 [SYN] Seq=0 Win=65535 Len=0 MSS=1260 SACK_PERM=1
    172 2011-10-06 16:55:00.680401 192.168.1.251         192.168.1.100         TCP      80 > 3444 [SYN, ACK] Seq=0 Ack=1 Win=5840 Len=0 MSS=1460 SACK_PERM=1
    173 2011-10-06 16:55:00.680419 192.168.1.100         192.168.1.251         TCP      3444 > 80 [ACK] Seq=1 Ack=1 Win=65535 Len=0
        *** TCP handshake with the Netgear web server

    174 2011-10-06 16:55:00.680489 192.168.1.100         192.168.1.251         HTTP     GET /wpad.dat HTTP/1.1 
        *** Firefox asks for the javascript proxy script

    176 2011-10-06 16:55:00.689129 192.168.1.251         192.168.1.100         HTTP     HTTP/1.1 200 OK  (text/html)
        *** And Netgear tries to redirect it, however this doesn't include a "function FindProxyForURL(url, host)" so it will be ignored

        Hypertext Transfer Protocol
        Line-based text data: text/html
            <html>\r\n
            \t<head>\r\n
            \r\n
            \t\t<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">\n
            \t\t<meta http-equiv="pragma" content="no-cache"> \n
            \t\t<meta http-equiv="cache-control" content="no-cache, must-revalidate"> \n
            \t\t<meta http-equiv="expires" content="0">\t\t<script language="javascript" type="text/javascript">\r\n
            \t\t\tlocation.replace("http://www.mywifiext.net/");\r\n
            \t\t</script>\r\n
            \t</head>\r\n
            \t<body> \r\n
            \t</body>\r\n
            </html>\r\n
            \r\n

    *** Sure enough, Firefox ploughs on to open its original site directly
    177 2011-10-06 16:55:00.761313 192.168.1.100         192.168.1.251         TCP      3445 > 80 [SYN] Seq=0 Win=65535 Len=0 MSS=1260 SACK_PERM=1
    178 2011-10-06 16:55:00.761650 192.168.1.251         192.168.1.100         TCP      80 > 3445 [SYN, ACK] Seq=0 Ack=1 Win=5840 Len=0 MSS=1460 SACK_PERM=1
    179 2011-10-06 16:55:00.761663 192.168.1.100         192.168.1.251         TCP      3445 > 80 [ACK] Seq=1 Ack=1 Win=65535 Len=0
    180 2011-10-06 16:55:00.761738 192.168.1.100         192.168.1.251         HTTP     GET /firefox?client=firefox-a&rls=org.mozilla:en-GB:official HTTP/1.1 
        Hypertext Transfer Protocol
            GET /firefox?client=firefox-a&rls=org.mozilla:en-GB:official HTTP/1.1\r\n
            Host: en-gb.start3.mozilla.com\r\n
            User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.2.23) Gecko/20110920 Firefox/3.6.23 ( .NET CLR 3.5.30729; .NET4.0E)\r\n
            Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\n
            Accept-Language: en-gb,en;q=0.5\r\n
            Accept-Encoding: gzip,deflate\r\n
            Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7\r\n
            Keep-Alive: 115\r\n
            Connection: keep-alive\r\n
            Cookie: WT_FPC=id=194.176.105.39-1877985552.30159488:lv=1313671495942:ss=1313671495942\r\n
            \r\n

    182 2011-10-06 16:55:00.764144 192.168.1.251         192.168.1.100         HTTP     HTTP/1.1 200 OK  (text/html)
        *** Netgear has another attempt at redirecting. Note the headers have revealed that Netgear is using lighttpd web server

        Hypertext Transfer Protocol
            HTTP/1.1 200 OK\r\n
            Expires: Sat, 01 Jan 2000 00:00:12 GMT\r\n
            Cache-Control: max-age=1\r\n
            Content-Type: text/html\r\n
            Accept-Ranges: bytes\r\n
            Content-Length: 419\r\n
            Date: Sat, 01 Jan 2000 00:00:55 GMT\r\n
            Server: lighttpd/1.4.18\r\n
            \r\n
        Line-based text data: text/html
            <html>\r\n
            \t<head>\r\n
            \r\n
            \t\t<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">\n
            \t\t<meta http-equiv="pragma" content="no-cache"> \n
            \t\t<meta http-equiv="cache-control" content="no-cache, must-revalidate"> \n
            \t\t<meta http-equiv="expires" content="0">\t\t<script language="javascript" type="text/javascript">\r\n
            \t\t\tlocation.replace("http://www.mywifiext.net/");\r\n
            \t\t</script>\r\n
            \t</head>\r\n
            \t<body> \r\n
            \t</body>\r\n
            </html>\r\n
            \r\n

    183 2011-10-06 16:55:00.839579 192.168.1.100         192.168.1.251         DNS      Standard query A www.mywifiext.net
    184 2011-10-06 16:55:00.839907 192.168.1.251         192.168.1.100         DNS      Standard query response A 192.168.1.251
        *** The redirection appears to work as Firefox now locates the new webpage

    185 2011-10-06 16:55:00.840931 192.168.1.100         192.168.1.251         TCP      3446 > 80 [SYN] Seq=0 Win=65535 Len=0 MSS=1260 SACK_PERM=1
    186 2011-10-06 16:55:00.841281 192.168.1.251         192.168.1.100         TCP      80 > 3446 [SYN, ACK] Seq=0 Ack=1 Win=5840 Len=0 MSS=1460 SACK_PERM=1
    187 2011-10-06 16:55:00.841300 192.168.1.100         192.168.1.251         TCP      3446 > 80 [ACK] Seq=1 Ack=1 Win=65535 Len=0
    188 2011-10-06 16:55:00.841371 192.168.1.100         192.168.1.251         HTTP     GET / HTTP/1.1 
        *** And we're off - the first page of our wizard
        *** Yards of omitted stuff after this...

Through the wizard, Netgear lists all of the visible SSIDs, allowing one to be selected and a key to be entered. The last section shows what happens as the Netgear associates to the WLAN and then turns into a bridge (or does it?)

   *** ...and as the last bit of data is finally posted from the wizard:
   1375 2011-10-06 16:56:20.549963 192.168.1.100         192.168.1.251         HTTP     POST /my_cgi.cgi?0.774414189696751 HTTP/1.1  (application/x-www-form-urlencoded)
   1379 2011-10-06 16:56:20.745822 192.168.1.251         192.168.1.100         HTTP/XML HTTP/1.1 200 OK 

   1381 2011-10-06 16:56:21.033167 0.0.0.0               255.255.255.255       DHCP     DHCP Discover - Transaction ID 0x7f52875c
        *** Netgear behaviour suggests that it is now attempting to become a DHCP client on someone else's segment
        *** However, Netgear now has two segments: WLAN and Ethernet, so it appears to be bridging between both mediums

        Bootstrap Protocol
            Message type: Boot Request (1)
            Transaction ID: 0x7f52875c
            Client IP address: 0.0.0.0 (0.0.0.0)
            Your (client) IP address: 0.0.0.0 (0.0.0.0)
            Next server IP address: 0.0.0.0 (0.0.0.0)
            Relay agent IP address: 0.0.0.0 (0.0.0.0)
            Client MAC address: Netgear_77:cf:fe (c4:3d:c7:77:cf:fe)
            Option: (t=53,l=1) DHCP Message Type = DHCP Discover
            Option: (t=61,l=7) Client identifier
            Option: (t=12,l=8) Host Name = "wnce2001"
            Option: (t=60,l=15) Vendor class identifier = "udhcp 0.9.9-pre"
            Option: (t=55,l=10) Parameter Request List

   *** This is confirmed, because we can now at last see traffic from the WLAN coming through:
   1382 2011-10-06 16:56:21.358369 ThomsonT_1b:1c:14     Broadcast             ARP      Who has 192.168.1.74?  Tell 192.168.1.254
   1383 2011-10-06 16:56:21.972481 ThomsonT_1b:1c:14     Broadcast             ARP      Who has 192.168.1.74?  Tell 192.168.1.254
   1385 2011-10-06 16:56:23.061375 0.0.0.0               255.255.255.255       DHCP     DHCP Discover - Transaction ID 0x7f52875c
   1386 2011-10-06 16:56:23.202151 ThomsonT_1b:1c:14     Broadcast             ARP      Who has 192.168.1.74?  Tell 192.168.1.254
        *** This looks like the WLAN router checking a free address:

        Address Resolution Protocol (request)
            Hardware type: Ethernet (0x0001)
            Protocol type: IP (0x0800)
            Hardware size: 6
            Protocol size: 4
            Opcode: request (0x0001)
            [Is gratuitous: False]
            Sender MAC address: ThomsonT_1b:1c:14 (00:26:44:1b:1c:14)
            Sender IP address: 192.168.1.254 (192.168.1.254)
            Target MAC address: 00:00:00_00:00:00 (00:00:00:00:00:00)
            Target IP address: 192.168.1.74 (192.168.1.74)

   1387 2011-10-06 16:56:23.205098 192.168.1.254         255.255.255.255       DHCP     DHCP Offer    - Transaction ID 0x7f52875c
        *** ... before giving it to Netgear. Note the lease is a decent length now.

        Bootstrap Protocol
            Message type: Boot Reply (2)
            Transaction ID: 0x7f52875c
            Client IP address: 0.0.0.0 (0.0.0.0)
            Your (client) IP address: 192.168.1.74 (192.168.1.74)
            Next server IP address: 0.0.0.0 (0.0.0.0)
            Relay agent IP address: 192.168.1.254 (192.168.1.254)
            Client MAC address: Netgear_77:cf:fe (c4:3d:c7:77:cf:fe)
            Option: (t=53,l=1) DHCP Message Type = DHCP Offer
            Option: (t=54,l=4) DHCP Server Identifier = 192.168.1.254
            Option: (t=51,l=4) IP Address Lease Time = 1 day
            Option: (t=1,l=4) Subnet Mask = 255.255.255.0
            Option: (t=6,l=4) Domain Name Server = 192.168.1.254
            Option: (t=15,l=4) Domain Name = "home"
            Option: (t=3,l=4) Router = 192.168.1.254


   1388 2011-10-06 16:56:23.261309 0.0.0.0               255.255.255.255       DHCP     DHCP Request  - Transaction ID 0x7f52875c
        *** Netgear asks for confirmation that it can use the address
        Bootstrap Protocol
            Transaction ID: 0x7f52875c
            Client IP address: 0.0.0.0 (0.0.0.0)
            Your (client) IP address: 0.0.0.0 (0.0.0.0)
            Next server IP address: 0.0.0.0 (0.0.0.0)
            Relay agent IP address: 0.0.0.0 (0.0.0.0)
            Client MAC address: Netgear_77:cf:fe (c4:3d:c7:77:cf:fe)
            Option: (t=53,l=1) DHCP Message Type = DHCP Request
            Option: (t=61,l=7) Client identifier
            Option: (t=12,l=8) Host Name = "wnce2001"
            Option: (t=60,l=15) Vendor class identifier = "udhcp 0.9.9-pre"
            Option: (t=50,l=4) Requested IP Address = 192.168.1.74
            Option: (t=54,l=4) DHCP Server Identifier = 192.168.1.254
            Option: (t=55,l=10) Parameter Request List

   *** slightly different behaviour from the main WLAN router - it is also arping at offer stage
   1389 2011-10-06 16:56:24.123034 ThomsonT_1b:1c:14     Broadcast             ARP      Who has 192.168.1.74?  Tell 192.168.1.254
   1390 2011-10-06 16:56:25.044698 ThomsonT_1b:1c:14     Broadcast             ARP      Who has 192.168.1.74?  Tell 192.168.1.254

   *** ...and we see the Netgear DHCP transaction go onto complete on that IP address
   *** I have a mild concern here about why a wireless bridge with a sophisticated embedded Linux system would require layer 3 membership on my WLAN...
   1391 2011-10-06 16:56:25.309071 0.0.0.0               255.255.255.255       DHCP     DHCP Request  - Transaction ID 0x7f52875c
   1392 2011-10-06 16:56:26.275711 192.168.1.254         255.255.255.255       DHCP     DHCP Offer    - Transaction ID 0x7f52875c
   1393 2011-10-06 16:56:26.279179 192.168.1.254         255.255.255.255       DHCP     DHCP ACK      - Transaction ID 0x7f52875c
   1394 2011-10-06 16:56:26.282191 192.168.1.254         255.255.255.255       DHCP     DHCP ACK      - Transaction ID 0x7f52875c
   1395 2011-10-06 16:56:27.157069 Netgear_77:cf:fe      Broadcast             ARP      Who has 192.168.1.254?  Tell 192.168.1.74

   *** But what about our PC? it seems somehow to have twigged that something has changed - DHCP requests (although still unicast), EAPOL etc
   *** I didn't notice Netgear dropping the ethernet link, but it that would have been one way to do it
   1396 2011-10-06 16:56:28.252537 192.168.1.100         192.168.1.251         DHCP     DHCP Request  - Transaction ID 0x7e41999d
   1397 2011-10-06 16:56:36.393373 192.168.1.100         192.168.1.251         ICMP     Echo (ping) request  (id=0x0300, seq(be/le)=17152/67, ttl=1)
   1398 2011-10-06 16:56:36.420095 CompalIn_fb:67:4c     Nearest               EAPOL    Start
   1399 2011-10-06 16:56:37.414054 192.168.1.100         192.168.1.251         ICMP     Echo (ping) request  (id=0x0300, seq(be/le)=17408/68, ttl=1)
   1401 2011-10-06 16:56:38.915020 192.168.1.100         192.168.1.251         ICMP     Echo (ping) request  (id=0x0300, seq(be/le)=17664/69, ttl=1)

   *** After about 12 secs, the PC starts broadcasting its request to extend the lease on its existing IP address
   1402 2011-10-06 16:56:40.418751 192.168.1.100         255.255.255.255       DHCP     DHCP Request  - Transaction ID 0xc0964cdc
        Bootstrap Protocol
            Message type: Boot Request (1)
            Transaction ID: 0xc0964cdc
            Client IP address: 192.168.1.100 (192.168.1.100)
            Your (client) IP address: 0.0.0.0 (0.0.0.0)
            Next server IP address: 0.0.0.0 (0.0.0.0)
            Relay agent IP address: 0.0.0.0 (0.0.0.0)
            Client MAC address: CompalIn_fb:67:4c (00:1b:38:fb:67:4c)
            Option: (t=53,l=1) DHCP Message Type = DHCP Request
            Option: (t=61,l=7) Client identifier
            Option: (t=12,l=6) Host Name = "PCNAME"
            Option: (t=81,l=31) Client Fully Qualified Domain Name
            Option: (t=60,l=8) Vendor class identifier = "MSFT 5.0"
            Option: (t=55,l=11) Parameter Request List
            Option: (t=43,l=3) Vendor-Specific Information

   1403 2011-10-06 16:56:40.717438 192.168.1.254         255.255.255.255       DHCP     DHCP NAK      - Transaction ID 0xc0964cdc
        *** But, predictably, WLAN router says no!

        Bootstrap Protocol
            Message type: Boot Reply (2)
            Transaction ID: 0xc0964cdc
            Client IP address: 0.0.0.0 (0.0.0.0)
            Your (client) IP address: 0.0.0.0 (0.0.0.0)
            Next server IP address: 0.0.0.0 (0.0.0.0)
            Relay agent IP address: 192.168.1.254 (192.168.1.254)
            Client MAC address: CompalIn_fb:67:4c (00:1b:38:fb:67:4c)
            Option: (t=53,l=1) DHCP Message Type = DHCP NAK
            Option: (t=54,l=4) DHCP Server Identifier = 192.168.1.254
            Option: (t=56,l=26) Message = "REQUEST for invalid lease"

   1409 2011-10-06 16:56:45.836342 0.0.0.0               255.255.255.255       DHCP     DHCP Discover - Transaction ID 0xc9b00679
        *** So we start from scratch again

   1410 2011-10-06 16:56:45.939624 192.168.1.254         255.255.255.255       DHCP     DHCP Offer    - Transaction ID 0xc9b00679
        *** New address for use on WLAN

        Bootstrap Protocol
            Message type: Boot Reply (2)
            Transaction ID: 0xc9b00679
            Client IP address: 0.0.0.0 (0.0.0.0)
            Your (client) IP address: 192.168.1.73 (192.168.1.73)
            Next server IP address: 0.0.0.0 (0.0.0.0)
            Relay agent IP address: 192.168.1.254 (192.168.1.254)
            Client MAC address: CompalIn_fb:67:4c (00:1b:38:fb:67:4c)
            Option: (t=53,l=1) DHCP Message Type = DHCP Offer
            Option: (t=54,l=4) DHCP Server Identifier = 192.168.1.254
            Option: (t=51,l=4) IP Address Lease Time = 1 day
            Option: (t=1,l=4) Subnet Mask = 255.255.255.0
            Option: (t=15,l=4) Domain Name = "home"
            Option: (t=6,l=4) Domain Name Server = 192.168.1.254
            Option: (t=3,l=4) Router = 192.168.1.254

   *** And then the transaction completes as before
   1411 2011-10-06 16:56:45.940058 0.0.0.0               255.255.255.255       DHCP     DHCP Request  - Transaction ID 0xc9b00679
   1412 2011-10-06 16:56:46.246842 192.168.1.254         255.255.255.255       DHCP     DHCP ACK      - Transaction ID 0xc9b00679
   1414 2011-10-06 16:56:46.273772 CompalIn_fb:67:4c     Broadcast             ARP      Gratuitous ARP for 192.168.1.73 (Request)
   1415 2011-10-06 16:56:46.421869 CompalIn_fb:67:4c     Nearest               EAPOL    Start
   1416 2011-10-06 16:56:46.914798 CompalIn_fb:67:4c     Broadcast             ARP      Gratuitous ARP for 192.168.1.73 (Request)
   1420 2011-10-06 16:56:47.471850 CompalIn_fb:67:4c     ThomsonT_1b:1c:14     ARP      192.168.1.73 is at 00:1b:38:fb:67:4c


   *** Lastly, of mild interest only, is the difference in behaviour between the Netgear DNS and the broadband router DNS
   *** This is now back to normal behaviour
   1447 2011-10-06 16:56:49.006741 192.168.1.73          192.168.1.254         DNS      Standard query SRV _ldap._tcp.WORKCampus._sites.dc._msdcs.subdom.workdomain.co.uk
   1449 2011-10-06 16:56:49.046225 192.168.1.254         192.168.1.73          DNS      Standard query response, No such name
   1450 2011-10-06 16:56:49.062867 192.168.1.73          192.168.1.254         DNS      Standard query SRV _ldap._tcp.dc._msdcs.subdom.workdomain.co.uk
   1451 2011-10-06 16:56:49.071474 192.168.1.73          192.168.1.254         DNS      Standard query SOA PCNAME.subdom.workdomain.co.uk
   1452 2011-10-06 16:56:49.094456 192.168.1.254         192.168.1.73          DNS      Standard query response, No such name
   1453 2011-10-06 16:56:49.101329 192.168.1.254         192.168.1.73          DNS      Standard query response, No such name

So quite a slick operation, but one that requires me to be happy having a third-party linux server sitting on my network disguised as a dumb transceiver. As far as I could see (from my broadband router arp table), the server went to sleep once it had done its job of creating a WLAN bridge, but who knows if it could be woken up...

Monday, 19 September 2011

Nagios Plugin 'check_procs' incorrectly finds 0 processes

When checking for running processes on remote Linux systems via NRPE, the Nagios plugin check_procs –C <process commandname> occasionally responds with unexpected results.

Example on a Zenworks 7 server

If we look for the tftpd daemon using ps:

ZEN03:/usr/local/nagios/libexec # ps -ef|grep tftpd

root 4103 1 0 Sep14 ? 00:01:31 /opt/novell/bin/novell-tftpd

root 20047 17950 0 13:06 pts/0 00:00:00 grep tftpd

Then we look for it with check_procs:

ZEN03:/usr/local/nagios/libexec # ./check_procs -C novell-tftpd

PROCS OK: 1 process with command name 'novell-tftpd'

The check_procs plugin correctly reports that one process has been found with this name

However, if we look for the proxy dhcp daemon using ps:

ZEN03:/usr/local/nagios/libexec # ps -ef |grep proxy

root 21171 1 0 Sep18 ? 00:00:00 /opt/novell/bin/novell-proxydhcpd

root 20137 17950 0 13:07 pts/0 00:00:00 grep proxy

And then with check_procs:

ZEN03:/usr/local/nagios/libexec # ./check_procs -C novell-proxydhcpd

PROCS OK: 0 processes with command name 'novell-proxydhcpd'

In this case, the check_procs plugin has reported that 0 processes have been found, even though we can clearly see that this is not the case.

The trick in these situations is to ask check_procs for more information using the –vv switch:

ZEN03:/usr/local/nagios/libexec # ./check_procs -vv -C novell-proxydhcpd

CMD: /bin/ps axwo 'stat uid pid ppid vsz rss pcpu comm args'

PROCS OK: 0 processes with command name 'novell-proxydhcpd'

Here check_procs has told us what it is passing to ps to find out the information it is reporting back to us.

So let us use that parameter list for our own check:

ZEN03:/usr/local/nagios/libexec # /bin/ps axwo 'stat uid pid ppid vsz rss pcpu comm args'|grep proxy

S 0 21171 1 1412 396 0.0 novell-proxydhc /opt/novell/bin/novell-proxydhcpd

S+ 0 20263 17950 1712 672 0.0 grep grep proxy

And there it is – on this system (SLES9), ps is only reporting back the first 15 characters, so no match is being found.

So here, what we have to do is to ask check_procs to only look for the first 15 characters

ZEN03:/usr/local/nagios/libexec # ./check_procs -C novell-proxydhc

PROCS OK: 1 process with command name 'novell-proxydhc'

We can now correctly check for our proxy dhcp daemon in Nagios.