VictorOps is an incident response package. The service can be integrated to work with other IT management systems. VictorOps is now a division of Splunk.
About VictorOps
VictorOps was founded in 2012 and made its headquarters in Boulder, Colorado, in the United States. In June 2018, the company was bought by Splunk.
The VictorOps system is classified as an Incident Management service. It acts as a hub for alerts. It interfaces to problem detection systems and then forwards them to development teams as alerts. The VictorOps system doesn’t identify problems, nor does it manage their resolution. The main market for VictorOps is for use in DevOps.
The system isn’t designed for use as a Help Desk environment. VictorOps doesn’t include a ticketing system or team management functions. However, it is possible to enter specialist skills for individual team members or groups that enable the system to send notifications to contacts when a problem arises.
Since VictorOps became part of Splunk, its name has been changed to Splunk On-Call. So, if you are looking for VictorOps, the new name would explain why it has been hard to find.
Splunk On-Call
Splunk On-Call is particularly useful for IT Operations teams that support vital 24-hour operations – systems that work around the clock, such as the IT infrastructure of the emergency services, a process flow control system for gas supply, or an around-the-clock automated factory. In these environments, “out of hours” doesn’t exist. However, for realistic employment management, the night shift might not fully complement system expertise. These specialists won’t be in the office in these instances but will be “on call.”
Another scenario lies with outsourced services. For example, the management teams of vital systems often place contracts for maintenance support with specialized consultancies. There are different contacts for different system specializations or a range of contacts with different service providers for different aspects of the system; switching the incident notification to a different destination is an important task.
These are the functions that Splunk On-Call performs. It requires those links between incidence types and responders to be set up. Fitting the service too complicated systems can make that setup task a time-consuming step. However, this is simply a way of codifying the memory of contracts, agreements, and plans that many system managers hold in their heads.
Centralizing and documenting the contact information and the decision-making processes of IT Operations managers when deciding who to call in an emergency provides continuity in the event. As a result, the people who carry that knowledge are absent through leave or illness or leave the business suddenly.
The process flow of Splunk On-Call’s operations is shown below.
As can be seen from the diagram, the system can receive alert messages through Slack and Teams. That means that any monitoring service that can generate notifications by Slack can work with Splunk On-Call. Notifications can also be received from Microsoft Teams.
Once Splunk On-Call receives a notification, it checks through its database of actions to perform and forwards those alerts to the appropriate person. Those forwarded alerts can also be sent using Slack or Microsoft Teams.
On-Call Essentials
The heart of the Splunk On-Call service is its database of contacts. The right person to call for a specific problem might be different at different times of the day. It could occasionally be necessary to contact another person if the primary contact is away. Splunk On-Call makes it possible to record several different people as the person to get through a schedule. So, not only is it necessary to enter contact information, but the system also needs to know when that person is responsible. A schedule calendar handles this issue in the settings of the On-Call system.
Another problem that system managers face is that the primary contact doesn’t always respond. The On-Call package includes automated escalation, which implements a second notification after a time delay. This might involve contacting the person in charge of the organization or department to which the primary contact belongs.
Webhooks allow additional actions to be performed automatically at the same time that an alert is forwarded. For example, such actions could be bouncing the server or displaying a status page on a website.
Rapid Response
The incident dashboard of Splunk On-Call gives a live log of all events that relate to the alert that was passed through the system. This report shows the people in the team that has been allocated the alert to deal with and a log of communications made through a messaging system connected to the On-Call system.
The manager in charge of the threatened system can activate a conference call through the On-Call dashboard to check up on progress. The system stores the phone numbers of each assigned team member and automatically groups and dials them without the manager needing to see each number.
The system manager can also inform stakeholders of awareness of the problem and progress on a solution. All of the personal and automated actions undertaken by the threatened system’s management team are logged in a timeline, which is a crucial log that will for a part of SLA conformance documentation.
Incident Automation
The core value of VictorOps is its ability to centralize the distribution of responsibilities related to a system problem. This central point of processing for alerts enables the tool to identify similar notifications. For example, if a production line stops moving, sensors at several points on the factory floor will trigger alerts. Rather than just passing through a flood of alerts, the VictorOps system merges all of them into one notification.
The reports allotted to a group of alerts preserve all of the original incoming alarms. This information is helpful as it comes from live monitoring systems. A systems engineer can quickly apply logic to identify the actual point of failure by looking at where all of the notifications came from.
The driver of the notification routing comes from the Alert Rules in VictorOps. These rules all have to be set up, so the operator in charge of setting those rules must know precisely how to frame the alert format and related triggers. Rules are a pair of incidences and actions to perform in response.
Although automation forwards alerts without manual intervention, the system manager can add notes to each as they are delivered to the responsible contact. It is also possible to set up attachments and boilerplate text for each type of alert. These notes might include safety instructions and liability notifications or training and troubleshooting guides.
Delivery Insights
The Delivery Insights module is an attractive feature for DevOps teams that operate a CI/CD pipeline. This analytical feature helps development team managers see whether the business is wasting too much money on poorly tested code that goes into production before it has been verified.
Leaving fixes until a module is already in production can be expensive. Unpicking existing systems to get down to a procedural error and remap it can impact related systems. They were letting incorrect code go live damages the business’s reputation by leaving clients and public members with the impression that the company cannot deliver its services. In the light of those failures, potential customers might wonder about the quality of service they can expect.
VictorOps deployment options
VictorOps is no longer available as an independent product. You need to look at Splunk On-Call instead.
Splunk On-Call is a SaaS platform. There is no on-premises version. The On-Call service includes the VictorOps software, the processing power to run it, and storage space to hold logs and statistics. Subscribers to AWS and Azure virtual server plans can add on Splunk services in the Marketplace of their preferred platform.
Splunk On-Call price
The pricing of Splunk On-Call is a little complicated because it isn’t offered as a separate module. Instead, it is an Add-on feature to the Splunk Observability Cloud package.
Splunk Observability Cloud is a package that includes Splunk Infrastructure Monitoring and Splunk APM, an application performance monitor. The bundle also involves Splunk Log Observer, which is a log manager and data searching product.
There are two plans for Splunk Observability Cloud and the lowest of these, called Standard, starts at $95 per month per host when billed annually. The higher plan, called Plus, also includes Splunk RUM, a package of tools to analyze live websites, and its starting price is $110 per month per host when billed annually.
The Splunk On-Call Add-on has a starting price of $5 per user per month when billed annually. In addition, you can get a 14-day free trial of Splunk On-Call and all other Splunk modules.
Splunk On-Call strengths and weaknesses
VictorOps, now called Splunk On-Call, has its niche as a notification manager for system error management. However, this tool doesn’t operate independently. All it does is pass through alerts that other software packages have raised. We have identified some strengths and weaknesses in Splunk On-Call.
Alternatives to VictorOps
Finding alternatives to VictorOps is a difficult task because, in many cases, the option to VictorOps is not to use anything. Instead, VictorOps routes alerts to specific people and record the notification events. However, many of the monitoring tools that could feed into VictorOps also offer writing routing rules for directing alerts to the right person.
Pros:
- Suitable for use by DevOps teams, checking on in-house functions once they go live
- Automates the notification process for people responsible for supporting systems that are in error
- Integrates with other Splunk products and any monitor that can send out notifications through Slack or Teams
- Extensive activity documentation for SLA compliance reporting
- Detects similar reports that relate to the same incident
Cons:
- Provides functionality that alert-raising monitoring tools already implement
- The quality of the systems incident routing relies on the ability of the user to create accurate rules
We have found several tools that are very good for supporting DevOps teams in error detection during the transition to production and once new functions and Web pages are live.
Here is our list of the five best alternatives to VictorOps.
- SolarWinds Service Desk (FREE TRIAL) This package includes team management and task management feature in its ticketing system. Just like VictorOps, SolarWinds Service Desk can integrate with Jira for project management and Slack for notifications. You can get monitoring alerts fed into the ticketing system and set up routes to let the system automatically allocate work. In addition, it will track progress and give each technician a task list with deadlines. SolarWinds Service Desk is a SaaS system, and you can access it on a 30-day free trial.
- Datadog APM + Continuous Profiler Datadog’s Application Performance Monitor has two plans, and the higher of these includes a Continuous Profiler and an Error Tracker. These functions track those of the Splunk Observability Cloud package with the Splunk On-Call add-on. As well as spotting problems in live code, this tool will circle it back to the development team. Subscribe to the new Datadog CI Visibility module to add on CI/CD pipeline management and look at the Incident Management module to complete the alternative to VictorOps. This is a SaaS platform, and all modules are offered a 14-day free trial.
- PagerDuty is a very close rival to Victor Ops. It relies on integrations with Slack and ServiceNow to mediate alerts raised by other monitoring tools and forward alerts according to rulebooks set up in the tool. But, again, this is a SaaS package, and you can get it on a 14-day free trial.
- Invicti A continuous testing service that tracks the development of code through testing and continues to examine its performance, spotting errors when it is live. This package can be integrated with JIRA to complete work allocation and task management for identified performance problems. This package is available as a SaaS platform or for installation on Windows Server. Request a demo.
- OpsGenie is an on-call and alert management system from Atlassian, the makers of Jira, the project management tool. OpsGenie provides careful routing and integrates with Jira to provide complete development and redevelopment planning and supervision. Try it free for 14 days.