图像分割中有很多好玩的应用场景,接下来就利用 matterport 的MASKRCNN识别道路井盖并标记出mask 蒙版来,然后让机器人在行驶过程中检测井盖,

目前只是觉得将来会有有一定的实用价值,先做出来看看。

动图封面

文章的内容是构建一个自定义的 Mask R-CNN 模型,该模型可以检测道路上井盖区域(参见 图像示例)。实际上可以利用图像分割做好多事情。本实例分割中井盖 是圆形的,所以相对于前两个案例中,使用via 进行标注时不用多边形 换成圆形的就可以了。

目录

  • 如何构建 Mask R-CNN
    • 收集数据
    • 注释数据
    • 训练模型
    • 验证模型
    • 运行图像模型并进行预测
    • 感谢

如何构建用于道路水坑检测的 Mask R-CNN 模型

为了构建自定义 Mask R-CNN,我们将利用 Matterport Github ,地址 github.com/matterport/M

MASKRCNN的搭建具有一定的挑战,请按照GitHub上的说明进行搭建。MASK RCNN 是基于TensorFlow 的python3版本。 还好最终搭建成功Mask R-CNN 。

我的环境:

maskrcnn 搭建记录 使用 英伟达cuda

tensorflow-gpu==1.6.0

keras ==2.1.6

anaconda3

python 3.6.9

cuda v9 ==cudnn7.0.5

GTX1050ti

nvidia-smi driver version 471.41 cuda version:11.4

但我用的是cuda9 ,并且训练成功,所以足以说明 cuda 本本向下兼容。

收集数据

在这个练习中,我自己在道路上用手机收集了 100 张图像,另加几个夜晚的视频。查看下面的一些示例。比如夜晚拍的效果是这样的:

注释数据

Mask R-CNN 模型要求用户注释图像并识别井盖区域。本教程使用的注释工具依旧是 VGG Image Annotator — v 1.0.6。您可以使用此链接 robots.ox.ac.uk/~vgg/so 提供的 html 版本 。使用此工具,您可以创建多边形遮罩,刚才说的使用圆形的 可以圈住大多的井盖,如下所示:

创建完所有注释后,您可以下载注释并将其保存为json格式。这里不同于LABELME的是 只生成一个json文本。

训练集和验证集分别生成的json文件,

训练模型

训练的python源码参考balloon.py修改的。训练中用到了 coco的H5模型。

训练指令

python3 manhole.py train --dataset=customImages/ --weights=coco

我正在使用 CPU 并在 100个steps 10个epoches需要花费14个小时,建议有条件的用GPU。

manhole.py

  1. """
  2. Mask R-CNN
  3. Train on the toy Balloon dataset and implement color splash effect.
  4. Copyright (c) 2018 Matterport, Inc.
  5. Licensed under the MIT License (see LICENSE for details)
  6. Written by Waleed Abdulla
  7. ------------------------------------------------------------
  8. Usage: import the module (see Jupyter notebooks for examples), or run from
  9. the command line as such:
  10. # Train a new model starting from pre-trained COCO weights
  11. python3 balloon.py train --dataset=/path/to/balloon/dataset --weights=coco
  12. # Resume training a model that you had trained earlier
  13. python3 balloon.py train --dataset=/path/to/balloon/dataset --weights=last
  14. # Train a new model starting from ImageNet weights
  15. python3 balloon.py train --dataset=/path/to/balloon/dataset --weights=imagenet
  16. # Apply color splash to an image
  17. python3 balloon.py splash --weights=/path/to/weights/file.h5 --image=<URL or path to file>
  18. # Apply color splash to video using the last weights you trained
  19. python3 balloon.py splash --weights=last --video=<URL or path to file>
  20. """
  21. import os
  22. import sys
  23. import json
  24. import datetime
  25. import numpy as np
  26. import skimage.draw
  27. import cv2
  28. from mrcnn import visualize
  29. from mrcnn.visualize import display_instances
  30. import matplotlib.pyplot as plt
  31. # Root directory of the project
  32. ROOT_DIR = os.getcwd()
  33. # Import Mask RCNN
  34. sys.path.append(ROOT_DIR) # To find local version of the library
  35. from mrcnn.config import Config
  36. from mrcnn import model as modellib, utils
  37. # Path to trained weights file
  38. COCO_WEIGHTS_PATH = os.path.join(ROOT_DIR, "mask_rcnn_coco.h5")
  39. # Directory to save logs and model checkpoints, if not provided
  40. # through the command line argument --logs
  41. DEFAULT_LOGS_DIR = os.path.join(ROOT_DIR, "logs")
  42. ############################################################
  43. # Configurations
  44. ############################################################
  45. class CustomConfig(Config):
  46. """Configuration for training on the toy dataset.
  47. Derives from the base Config class and overrides some values.
  48. """
  49. # Give the configuration a recognizable name
  50. NAME = "manhole"
  51. # We use a GPU with 12GB memory, which can fit two images.
  52. # Adjust down if you use a smaller GPU.
  53. IMAGES_PER_GPU = 1
  54. BACKBONE = "resnet50"
  55. # Number of classes (including background)
  56. NUM_CLASSES = 1 + 1 # Background + toy
  57. IMAGE_MIN_DIM = 512
  58. IMAGE_MAX_DIM = 512
  59. RPN_ANCHOR_SCALES = (8*8, 16*8, 32*8, 64*8, 128*8) # anchor side in pixels
  60. # Number of training steps per epoch
  61. STEPS_PER_EPOCH = 100
  62. # Skip detections with < 90% confidence
  63. DETECTION_MIN_CONFIDENCE = 0.85
  64. DETECTION_NMS_THRESHOLD = 0.12
  65. DETECTION_MAX_INSTANCES = 10
  66. ############################################################
  67. # Dataset
  68. ############################################################
  69. class CustomDataset(utils.Dataset):
  70. def load_custom(self, dataset_dir, subset):
  71. """Load a subset of the Balloon dataset.
  72. dataset_dir: Root directory of the dataset.
  73. subset: Subset to load: train or val
  74. """
  75. # Add classes. We have only one class to add.
  76. self.add_class("manhole", 1, "manhole")
  77. # Train or validation dataset?
  78. assert subset in ["train", "val"]
  79. dataset_dir = os.path.join(dataset_dir, subset)
  80. # Load annotations
  81. # VGG Image Annotator saves each image in the form:
  82. # { 'filename': '28503151_5b5b7ec140_b.jpg',
  83. # 'regions': {
  84. # '0': {
  85. # 'region_attributes': {},
  86. # 'shape_attributes': {
  87. # 'all_points_x': [...],
  88. # 'all_points_y': [...],
  89. # 'name': 'polygon'}},
  90. # ... more regions ...
  91. # },
  92. # 'size': 100202
  93. # }
  94. #"filename":"image54.jpg",
  95. #"base64_img_data":"","file_attributes":{},
  96. #"regions":{
  97. #"0":{
  98. # "shape_attributes":{
  99. # "name":"ellipse",
  100. # "cx":437,"cy":1007,"rx":278,"ry":166
  101. # },
  102. # "region_attributes":{}}
  103. # }
  104. # We mostly care about the x and y coordinates of each region
  105. annotations1 = json.load(open(os.path.join(dataset_dir, "via_region_data.json")))
  106. # print(annotations1)
  107. annotations = list(annotations1.values()) # don't need the dict keys
  108. # The VIA tool saves images in the JSON even if they don't have any
  109. # annotations. Skip unannotated images.
  110. annotations = [a for a in annotations if a['regions']]
  111. # Add images
  112. for a in annotations:
  113. # print(a)
  114. # Get the x, y coordinaets of points of the polygons that make up
  115. # the outline of each object instance. There are stores in the
  116. # shape_attributes (see json format above)
  117. polygons = [r['shape_attributes'] for r in a['regions'].values()]
  118. # load_mask() needs the image size to convert polygons to masks.
  119. # Unfortunately, VIA doesn't include it in JSON, so we must read
  120. # the image. This is only managable since the dataset is tiny.
  121. image_path = os.path.join(dataset_dir, a['filename'])
  122. image = skimage.io.imread(image_path)
  123. height, width = image.shape[:2]
  124. self.add_image(
  125. "manhole", ## for a single class just add the name here
  126. image_id=a['filename'], # use file name as a unique image id
  127. path=image_path,
  128. width=width, height=height,
  129. polygons=polygons)
  130. def load_mask(self, image_id):
  131. """Generate instance masks for an image.
  132. Returns:
  133. masks: A bool array of shape [height, width, instance count] with
  134. one mask per instance.
  135. class_ids: a 1D array of class IDs of the instance masks.
  136. """
  137. # If not a balloon dataset image, delegate to parent class.
  138. image_info = self.image_info[image_id]
  139. if image_info["source"] != "manhole":
  140. return super(self.__class__, self).load_mask(image_id)
  141. # Convert polygons to a bitmap mask of shape
  142. # [height, width, instance_count]
  143. info = self.image_info[image_id]
  144. # mask height width and mask's count
  145. #print("info:",image_id,info["height"], info["width"], len(info["polygons"]))
  146. mask = np.zeros([info["height"], info["width"], len(info["polygons"])],
  147. dtype=np.uint8)
  148. for i, p in enumerate(info["polygons"]):
  149. if p['name'] == 'ellipse':
  150. # Get indexes of pixels inside the polygon and set them to 1
  151. rr, cc = skimage.draw.ellipse( p['cy'],p['cx'], p['ry'], p['rx'])
  152. #print("info:",rr, cc)
  153. mask[rr, cc, i] = 1
  154. elif p['name'] == 'polygon':
  155. rr, cc = skimage.draw.polygon(p['all_points_y'], p['all_points_x'])
  156. mask[rr, cc, i] = 1
  157. # Return mask, and array of class IDs of each instance. Since we have
  158. # one class ID only, we return an array of 1s
  159. return mask.astype(np.bool), np.ones([mask.shape[-1]], dtype=np.int32)
  160. def image_reference(self, image_id):
  161. """Return the path of the image."""
  162. info = self.image_info[image_id]
  163. if info["source"] == "manhole":
  164. return info["path"]
  165. else:
  166. super(self.__class__, self).image_reference(image_id)
  167. def train(model):
  168. """Train the model."""
  169. # Training dataset.
  170. dataset_train = CustomDataset()
  171. dataset_train.load_custom(args.dataset, "train")
  172. dataset_train.prepare()
  173. # Validation dataset
  174. dataset_val = CustomDataset()
  175. dataset_val.load_custom(args.dataset, "val")
  176. dataset_val.prepare()
  177. # *** This training schedule is an example. Update to your needs ***
  178. # Since we're using a very small dataset, and starting from
  179. # COCO trained weights, we don't need to train too long. Also,
  180. # no need to train all layers, just the heads should do it.
  181. print("Training network heads")
  182. model.train(dataset_train, dataset_val,
  183. learning_rate=config.LEARNING_RATE,
  184. epochs=20,
  185. layers='heads')
  186. def color_splash(image, masks,N):
  187. """Apply color splash effect.
  188. image: RGB image [height, width, 3]
  189. mask: instance segmentation mask [height, width, instance count]
  190. Returns result image.
  191. """
  192. # Make a grayscale copy of the image. The grayscale copy still
  193. # has 3 RGB channels, though.
  194. gray = skimage.color.gray2rgb(skimage.color.rgb2gray(image)) * 255
  195. # We're treating all instances as one, so collapse the mask into one layer
  196. mask = (np.sum(masks, -1, keepdims=True) >= 1)
  197. #rgb red color
  198. color = (1.0,0.0,0.0)
  199. '''
  200. # Copy color pixels from the original color image where mask is set
  201. if mask.shape[0] > 0:
  202. splash = np.where(mask, (128,0,0), gray).astype(np.uint8)
  203. else:
  204. splash = gray
  205. '''
  206. masked_image = image.astype(np.uint32).copy()
  207. splash = image
  208. for i in range(N):
  209. mask = masks[:, :, i]
  210. splash = visualize.apply_mask(gray, mask,color)
  211. splash.astype(np.uint32)
  212. return splash
  213. def detect_and_color_splash(model, image_path=None, video_path=None):
  214. assert image_path or video_path
  215. # Image or video?
  216. if image_path:
  217. # Run model detection and generate the color splash effect
  218. print("Running on {}".format(args.image))
  219. # Read image
  220. image = skimage.io.imread(args.image)
  221. # Detect objects
  222. r = model.detect([image], verbose=1)[0]
  223. # Number of instances
  224. N = r['rois'].shape[0]
  225. print("\n*** instances to display :",N)
  226. if N > 0:
  227. # Color splash
  228. splash = color_splash(image, r['masks'],N)
  229. # Save output
  230. file_name = "result/splash_{:%Y%m%dT%H%M%S}.png".format(datetime.datetime.now())
  231. skimage.io.imsave(file_name, splash)
  232. print("Saved to ", file_name)
  233. elif video_path:
  234. import cv2
  235. # Video capture
  236. vcapture = cv2.VideoCapture(video_path)
  237. width = int(vcapture.get(cv2.CAP_PROP_FRAME_WIDTH))
  238. height = int(vcapture.get(cv2.CAP_PROP_FRAME_HEIGHT))
  239. fps = vcapture.get(cv2.CAP_PROP_FPS)
  240. # Define codec and create video writer
  241. file_name = "splash_{:%Y%m%dT%H%M%S}.avi".format(datetime.datetime.now())
  242. vwriter = cv2.VideoWriter(file_name,
  243. cv2.VideoWriter_fourcc(*'MJPG'),
  244. fps, (width, height))
  245. count = 0
  246. success = True
  247. while success:
  248. print("frame: ", count)
  249. # Read next image
  250. success, image = vcapture.read()
  251. if success:
  252. # OpenCV returns images as BGR, convert to RGB
  253. image = image[..., ::-1]
  254. # Detect objects
  255. r = model.detect([image], verbose=0)[0]
  256. N = r['rois'].shape[0]
  257. print("\n*** instances to display :",N)
  258. # Color splash
  259. splash = color_splash(image, r['masks'],N)
  260. file_name = "result/mask_{:%Y%m%dT%H%M%S}.png".format(datetime.datetime.now())
  261. skimage.io.imsave(file_name, splash)
  262. # RGB -> BGR to save image to video
  263. splash = splash[..., ::-1]
  264. # Add image to video writer
  265. vwriter.write(splash)
  266. count += 1
  267. vwriter.release()
  268. print("Saved to ", file_name)
  269. ############################################################
  270. # Training
  271. ############################################################
  272. if __name__ == '__main__':
  273. import argparse
  274. # Parse command line arguments
  275. parser = argparse.ArgumentParser(
  276. description='Train Mask R-CNN to detect custom class.')
  277. parser.add_argument("command",
  278. metavar="<command>",
  279. help="'train' or 'splash'")
  280. parser.add_argument('--dataset', required=False,
  281. metavar="/path/to/custom/dataset/",
  282. help='Directory of the custom dataset')
  283. parser.add_argument('--weights', required=True,
  284. metavar="/path/to/weights.h5",
  285. help="Path to weights .h5 file or 'coco'")
  286. parser.add_argument('--logs', required=False,
  287. default=DEFAULT_LOGS_DIR,
  288. metavar="/path/to/logs/",
  289. help='Logs and checkpoints directory (default=logs/)')
  290. parser.add_argument('--image', required=False,
  291. metavar="path or URL to image",
  292. help='Image to apply the color splash effect on')
  293. parser.add_argument('--video', required=False,
  294. metavar="path or URL to video",
  295. help='Video to apply the color splash effect on')
  296. args = parser.parse_args()
  297. # Validate arguments
  298. if args.command == "train":
  299. assert args.dataset, "Argument --dataset is required for training"
  300. elif args.command == "splash":
  301. assert args.image or args.video,\
  302. "Provide --image or --video to apply color splash"
  303. print("Weights: ", args.weights)
  304. print("Dataset: ", args.dataset)
  305. print("Logs: ", args.logs)
  306. # Configurations
  307. if args.command == "train":
  308. config = CustomConfig()
  309. else:
  310. class InferenceConfig(CustomConfig):
  311. # Set batch size to 1 since we'll be running inference on
  312. # one image at a time. Batch size = GPU_COUNT * IMAGES_PER_GPU
  313. GPU_COUNT = 1
  314. IMAGES_PER_GPU = 1
  315. config = InferenceConfig()
  316. config.display()
  317. # Create model
  318. if args.command == "train":
  319. model = modellib.MaskRCNN(mode="training", config=config,
  320. model_dir=args.logs)
  321. else:
  322. model = modellib.MaskRCNN(mode="inference", config=config,
  323. model_dir=args.logs)
  324. # Select weights file to load
  325. if args.weights.lower() == "coco":
  326. weights_path = COCO_WEIGHTS_PATH
  327. # Download weights file
  328. ifnot os.path.exists(weights_path):
  329. utils.download_trained_weights(weights_path)
  330. elif args.weights.lower() == "last":
  331. # Find last trained weights
  332. weights_path = model.find_last()[1]
  333. elif args.weights.lower() == "imagenet":
  334. # Start from ImageNet trained weights
  335. weights_path = model.get_imagenet_weights()
  336. else:
  337. weights_path = args.weights
  338. # Load weights
  339. print("Loading weights ", weights_path)
  340. if args.weights.lower() == "coco":
  341. # Exclude the last layers because they require a matching
  342. # number of classes
  343. model.load_weights(weights_path, by_name=True, exclude=[
  344. "mrcnn_class_logits", "mrcnn_bbox_fc",
  345. "mrcnn_bbox", "mrcnn_mask"])
  346. else:
  347. model.load_weights(weights_path, by_name=True)
  348. # Train or evaluate
  349. if args.command == "train":
  350. train(model)
  351. elif args.command == "splash":
  352. detect_and_color_splash(model, image_path=args.image,
  353. video_path=args.video)
  354. else:
  355. print("'{}' is not recognized. "
  356. "Use 'train' or 'splash'".format(args.command))

  1. 代码中 loadmask 做了修改,之前的文章有介绍过如何loadmask circle等形状。

在图像上运行模型并进行预测

manhole.py 中的color_spash内容我们做了修改,所有的实例instance 都用同一种颜色mask处理。

预测指令:

python3 manhole.py splash --image=customImages/test/bitauto.jpg --weights=mask_rcnn_damage_0010.h5

另外使用ffmeg将图片转换成GIF 。

ffmpeg -r 2 -i %d.png 11.gif -y

-r 2 一秒2帧

-y 覆盖原来

预测有一定的误差和丢失。也许可以通过加大训练集增加准确率。

感谢

非常感谢 Matterport 在GitHub上开放的源码,同时也感谢priya 分享的详细博客analyticsvidhya.com/blo