The Discovery Challenge
When you're responsible for discovering thousands of servers and databases across an enterprise, every percentage point of coverage matters. Here's how I took Discovery from struggling to thriving.
Starting Point: The Gap Analysis
Initial State
- Server Discovery: 60% coverage
- Database Discovery: 40% coverage
- Network Devices: 55% coverage
- Common Issues: Credential failures, timeouts, pattern errors
Systematic Troubleshooting Approach
1. Credential Management
Problem: 40% of failed discoveries were credential-related.
Solutions:
- Centralized credential store
- Automated credential validation
- Least-privilege principle implementation
- Regular credential rotation testing
// Credential Validation Script
var cred = new GlideRecord('discovery_credentials')
cred.get('credential_sys_id')
var testResult = new DiscoveryCredentialTest()
testResult.test(cred)
if (!testResult.isValid()) {
gs.log('Credential validation failed: ' + testResult.getError())
// Alert credential owner
}
2. MID Server Optimization
Issues Found:
- MID servers running out of memory
- Network latency issues
- Concurrent probe limits
Optimizations:
- Increased heap size to 4GB
- Distributed MID servers by network segment
- Tuned concurrent probe settings
- Implemented MID server health monitoring
3. Pattern Enhancement
Approach:
- Reviewed failed discovery attempts
- Identified pattern gaps
- Created custom patterns for edge cases
- Collaborated with vendors for pattern improvements
Example: Custom Database Discovery Pattern
<pattern name="Custom_Oracle_Discovery">
<probe name="oracle_listener">
<port>1521</port>
<timeout>30</timeout>
</probe>
<parser>
<parse_version>
<!-- Extract Oracle version -->
</parse_version>
</parser>
</pattern>
4. Network Access Review
Discoveries:
- Firewall rules blocking discovery
- Port access issues
- Network segmentation challenges
Resolution:
- Documented all required ports
- Worked with network team for rule updates
- Created network access matrix
- Implemented regular connectivity testing
Discovery Scheduling Strategy
Optimized Schedule
- Low-Impact Windows: 2-6 AM for production systems
- Staggered Starts: Avoid overwhelming MID servers
- Incremental Discovery: Daily vs. weekly full scans
- Priority-Based: Critical systems first
Results: The Numbers
Coverage Improvements
| Metric | Before | After | Improvement |
|---|---|---|---|
| Servers | 60% | 95% | +35 points |
| Databases | 40% | 85% | +45 points |
| Network Devices | 55% | 90% | +35 points |
| Overall Coverage | 52% | 90% | +38 points |
Business Impact
- Compliance: Audit-ready asset inventory
- Security: Complete visibility for vulnerability management
- Cost: Accurate licensing and capacity planning
- Risk: Identification of shadow IT
Key Learnings
- Start with the Basics: Credentials and network access
- Monitor Everything: You can't fix what you can't see
- Collaborate: Network, security, and server teams are essential
- Iterate: Small improvements compound over time
- Document: Knowledge transfer is critical
Tools and Scripts
Discovery Health Dashboard
Created custom dashboards tracking:
- Discovery success rate by device type
- Failed discovery reasons (top 10)
- MID server performance metrics
- Pattern usage and success rates
Automated Alerts
Implemented proactive monitoring:
// Alert on Discovery Failures
var failedDiscoveries = new GlideRecord('discovery_status')
failedDiscoveries.addQuery('status', 'failed')
failedDiscoveries.addQuery('sys_created_on', '>', gs.daysAgoStart(1))
failedDiscoveries.query()
if (failedDiscoveries.getRowCount() > threshold) {
// Send alert to discovery team
sendNotification(failedDiscoveries.getRowCount())
}
Next Steps
Current focus areas:
- Cloud resource discovery (AWS, Azure)
- Application dependency mapping
- Service model automation
- Predictive discovery scheduling
Conclusion
Improving Discovery coverage isn't magic—it's methodical troubleshooting, cross-team collaboration, and continuous optimization. The key is treating Discovery as a program, not a project.
What's your biggest Discovery challenge? Let's discuss in the comments!