Introduction
Одной из основных целей любого государства является обеспечение доступной, надежной и устойчивой электрической энергией. Главное препятствие к этому связано с тем фактом, что энергоснабжающие организации не всегда обладают всеобъемлющей картой воздушных линий электропередач, а схемы, которые существуют, как правило, устарели и неполны. Без централизованной карты, правительства или другие организации не обладают знаниями для принятия обоснованных решений об инвестировании средств в техническое обслуживание или расширение электрической сети. Этот недостаток информации также усложняет принятие решений об установке альтернативных источников энергии – не зная, где обычная сеть, трудно разумно использовать альтернативы, такие как солнечная или ветровая энергия. Помимо самой карты, правительства и организации нуждаются в быстром и экономически эффективном инструменте. Высоковольтная сеть постоянно расширяется, поэтому важным моментом является возможность создания точных снимков через равные промежутки времени. Функционал платформы cGIS включает алгоритм, способный эффективно картировать инфраструктуру воздушных линий электропередач.
This feature provides an automatic solution to the problem of power line detection based on the use of machine learning. The solution is an algorithm that receives remote sensing data as input. The resulting images are processed in the visible spectrum by a neural network and return geospatial locations that with a high degree of probability contain elements of high-voltage infrastructure - power line poles. This data further requires verification by the operator. However, even in their original form, they are classified as geographic positions that correspond with a high degree of probability to the locations of real objects in the areas.
Classification or detection?
The first question to create such a solution was to set the machine learning task itself. We were initially inspired by the way DevSeed solved a similar problem. Their solution involves the classification of vector tiles, on a map with a scale of 1:2000, which indicates whether there are power line poles on a given tile. Subsequently, the operator needs to manually trace power lines on the classified tiles, using a web interface with an interactive map.
This method of overhead power lines recognition has proved to be insufficiently effective. We decided to find a better solution to this problem, namely the direct detection of the power poles themselves on the map.
We decided to use detection instead of classification because the detection algorithm has the same disadvantages as the classification algorithm.
Advantages:
- Strong approximation (1:2000) is not required to detect power lines. Publicly available maps at scales larger than 1:4000 require a special rate plan, while for detection the size of the tile is not essential, it is only important that the power line pole itself is directly distinguishable on the map.
- The manual tracing process is not required.
- The detection output immediately gives the location of the power line pole in geographic coordinates, allowing the power lines to be traced automatically.
Disadvantages:
- False positives when trees can stand out as power line poles.
- Misses, when some objects on the map may be missed due to an error in the detection algorithm.
Raw data for machine learning
The input data for the training task are the geographic coordinates of power line poles categorized by the following meaningful attributes: pole type, product material, voltages, and others. A total of 211725 objects were represented in the full dataset. Further work was carried out with the data grouped by support type as the most representative feature. The following is a summary of the data by this criterion.
Type | Number of objects |
Reinforced concrete transmission line poles up to 20 kV | 119383 |
Wooden transmission line supports up to 20 kV | 83453 |
Intermediate poles of 110 kV transmission lines | 4938 |
110 kV anchored poles | 1176 |
Intermediate poles up to 330 kV | 808 |
Metal poles up to 20 kV | 604 |
Tiebolts up to 330 kV | 265 |
Intermediate poles of power transmission lines 35 kV | 175 |
220 kV 220 kV intermediate towers | 96 |
Substation and overhead line gantries of 110-330 kV | 39 |
35 kV transmission line anchored poles | 31 |
Mast poles, road poles | 12 |
110 kV OL and Substation gantries | 9 |
220 kV transmission line anchor poles | 8 |
Spans | 2 |
35 kV OL and PL portals | 2 |
Aerial bundled cables | 1 |
TOTAL | 211725 |
Due to the lack of distinguishability of objects of all designated types on the images, the following objects were selected for further work in the machine learning algorithm:
Type | Number of objects |
Intermediate poles up to 330 kV | 808 |
220 kV 220 kV intermediate towers | 96 |
Intermediate poles of 110 kV transmission lines | 4938 |
Tiebolts up to 330 kV | 265 |
220 kV transmission line anchor poles | 8 |
110 kV anchored poles | 1176 |
TOTAL | 7291 |
The data for machine learning was a raster image of 512x512 pixels with the areas of high-voltage power line towers marked on it. The markup was a rectangular area in the image, corresponding to an average area of 70x70 meters on the ground.
For machine learning, the original raster images were partitioned with 80% data for training and 20% data for testing.
Choice of solution architecture
For the task of detecting objects in an image where objects of the target class are only a small part of the image, it is a problem that the parts of the image where the target object is missing contribute too much to the training process, eventually leading to many gaps in the test set. To solve this problem, we used the Focal Loss neural network function, which reduces the influence of frequent backgrounds and increases the importance of infrequent objects in training.
Таким образом была выбрана следующая архитектура нейронной сети для детектирования – RetinaNet, которая как раз и использует Focal Loss в качестве функции потерь.
During the RetinaNet training, the loss function is calculated for all considered orientations of candidate areas (anchors), from all levels of image scaling. In total, there are about 100 thousand areas for one image. The Focal Loss value is calculated as the sum of function values for all anchors, normalized by the number of anchors containing the sought objects. The normalization is done only by them and not by the total number, since the vast majority of anchors are easily defined backgrounds, with little contribution to the total loss function.
Structurally, RetinaNet consists of Backbone and two additional networks (Classification Subnet) and Object Boundary Definition (Box Regression Subnet).
В качестве базисной нейронной сети используется так называемая Feature Pyramid Network (FPN), работающая поверх одной из общеиспользуемых свёрточных нейронных сетей (например ResNet-50). FPN имеет дополнительные боковые выходы со скрытых слоев свёрточной сети, формирующие уровни пирамиды с разным масштабом. Каждый уровень дополняется «знаниями сверху», т.е. информацией с более высоких уровней, имеющих меньший размер, но содержащих сведения об областях большей площади. Выглядит это как искусственное увеличение (например, простым повтором элементов) более «свёрнутой» карты признаков до размера текущей карты, их поэлементное суммирование и передача как на более низкие уровни пирамиды, так и на вход остальных подсетей (т.е. в Classification Subnet и Box Regression Subnet). Это позволяет выделить из исходного изображения пирамиду признаков в разных масштабах, на которых могут быть обнаружены как большие, так и мелкие объекты. FPN используется во многих архитектурах, улучшая детектирование объектов разного масштаба – RPN, DeepMask, Fast R-CNN, Mask R-CNN и других.
Our network, like the original one, uses FPN with 5 levels numbered P3 through P7. The level Pl has a resolution 2l times smaller than the input image. All levels of the pyramid have the same number of channels C = 256 and the number of anchors.
Площади анкоров были подобраны следующим образом: [16 х 16] до [256 x 256] для каждого уровня пирамиды от P3 до P7 соответственно, с шагом смещения (strides) [8 — 128] пикселей. Такой размер позволяет анализировать мелкие объекты и некоторую окрестность вокруг. В нашем случае – это опоры линий электропередач с прилегающей к ним тенью.
Training and Results
Machine learning results were evaluated in the following ways:
- Testing on a delayed sample of 20% of the original sample to match the predicted rectangular frames and the true manually marked ones by the expert. 70% of the rectangle areas were correctly identified, given that the amount of data is not numerous.
- Calculating the deviation distance of the actual coordinates from the predicted coordinates in the entire dataset. During the automatic location of the supports by the machine-learning algorithm, up to 60% of the locations were found to correspond to the actual objects of the electrical networks (supports) and are, on average, at a distance of no more than 70 meters from the predicted ones. Also, there were found cases when there were no poles in the data, but the algorithm found them in the image, which eventually led to a distortion of the quality metric, because there is an object in the image, but it is not in the reference database, which suggests that the algorithm can be successfully used to update the databases with information about power lines.