An industry-leading software company that supplies accurate, timely, and difficult-to-acquire data to utilities has created a custom ML model and generative AI tool that speeds up compliance reporting for power outages. This solution reduces the time that engineers spend diagnosing the root cause of power outages and assists them in writing reports that must be submitted related to those outages.
Challenge
When an electrical outage occurs, the root cause must be determined, and a report must be created that outlines the root cause and remedial actions taken. The report must follow a specific format and be submitted to the various energy regulation commissions. Prior to the creation of the customer’s AI solution, the process was entirely manual, taking multiple hours for each outage to determine the root cause and draft the required documentation.
Solution
The engagement began with the AllCloud team reviewing the reports that are manually created, learning the business process and the value stream associated with the work, and familiarizing themselves with the pain points that exist in the industry. At this point the team estimated services cost via the AWS Cost Calculator to ensure the customer and consultant had a joint understanding of the AWS technology that would be involved in the solution and gain an accurate estimate for the cost of their use case. The solution’s foundation is a custom AI model, built and deployed with full ML Ops in SageMaker. The model is capable of determining the arc type code associated with a fault in the power grid, based on data streamed from microswitches at substations.
Since the data that feeds the model comes from a streamed dataset, and as the data needs to be available in S3 for the Sagemaker model, the AllCloud team also built a data lake to house and prepare the information for inference.
Once the data lake was built to store the data and the custom model to determine the Arc Type existed, the team built a process leveraging AWS Bedrock to generate the Protection System Operation Analysis Reports (PSOAR) that engineers had previously been authoring manually for each outage. The reports generated by the AI solution are very similar in content and format to those manually generated.
While the solution does not eliminate the need for an engineer as a part of the process, it greatly reduces the amount of time spent finalizing and submitting the report to various agencies as required.
The entire solution was kept as serverless as possible, initially deployed within a POC/POV account, and later migrated to AllCloud’s Next Generation Landing Zone (NGLZ) to provide a secure and well-managed foundation for housing these workloads for the long term. Alongside designing the solution, the AllCloud team collaborated with the customer to integrate the inference endpoints and capabilities into their application. Additionally, the AllCloud team designed and implemented a framework for MLOps, CDK, and infrastructure promotions, enabling monitoring, continuous improvement, and reducing friction for ongoing updates to all resources.
Outcomes
As mentioned, the process at the beginning of the project was not only manual but also very time-intensive. When writing PSOARs, a protection and control engineer could spend up to four hours of manual work to generate a complete write-up. By leveraging Bedrock and the ancillary ML stack, the engineer can quickly obtain the most likely fault outcome, and generate a full first draft of a report available to them in roughly 10 seconds or less. Even if the engineer is especially thorough and does a half-hour review of the output, this is roughly an 88% time savings that can be gained on each and every report. As the team looks to the future, there is now an opportunity to get more accurate results, at faster intervals, and allow companies that manage the grid to have less downtime when incidents occur.