Abstract:
A road sign interpretation system includes a front-facing camera mounted on or in a vehicle collecting image data of multiple road signs. A first convolutional neural network (CNN) receives the image data from the front-facing camera and yields a set of sign predictions including one or more sign text instances. A second CNN defining a text extractor receives the image data from the front-facing camera and extracts text candidates including the multiple sign text instances. Sign and sign data localization is provided in the second CNN to compute a text order from the multiple sign text instances. A sign text synthesizer module receives individual sign text instances from the first CNN and individual ones of the sign text instances in digitized forms from an optical character recognizer (OCR). A semantic encoding and interpretation module receives the sign text instances and identifies semantics of the multiple road signs.
Abstract:
A system for determining a relevance of a traffic sign for a vehicle includes at least one vehicle camera configured to provide a view of an environment surrounding the vehicle and a vehicle controller in electrical communication with the at least one vehicle camera. The vehicle controller is programmed to capture an image using the at least one vehicle camera. The vehicle controller is further programmed to identify the traffic sign in the image. The vehicle controller is further programmed to determine a pan angle and a tilt angle of the traffic sign based at least in part on the image. The vehicle controller is further programmed to determine the relevance of the traffic sign based at least in part on the pan angle and the tilt angle of the traffic sign.
Abstract:
A road sign interpretation system includes a front-facing camera mounted on or in a vehicle collecting image data of multiple road signs. A first convolutional neural network (CNN) receives the image data from the front-facing camera and yields a set of sign predictions including one or more sign text instances. A second CNN defining a text extractor receives the image data from the front-facing camera and extracts text candidates including the multiple sign text instances. Sign and sign data localization is provided in the second CNN to compute a text order from the multiple sign text instances. A sign text synthesizer module receives individual sign text instances from the first CNN and individual ones of the sign text instances in digitized forms from an optical character recognizer (OCR). A semantic encoding and interpretation module receives the sign text instances and identifies semantics of the multiple road signs.
Abstract:
A method for localizing and estimating a pose of a known object in a field of view of a vision system is described, and includes developing a processor-based model of the known object, capturing a bitmap image file including an image of the field of view including the known object, extracting features from the bitmap image file, matching the extracted features with features associated with the model of the known object, localizing an object in the bitmap image file based upon the extracted features, clustering the extracted features of the localized object, merging the clustered extracted features, detecting the known object in the field of view based upon a comparison of the merged clustered extracted features and the processor-based model of the known object, and estimating a pose of the detected known object in the field of view based upon the detecting of the known object.
Abstract:
A method for localizing and estimating a pose of a known object in a field of view of a vision system is described, and includes developing a processor-based model of the known object, capturing a bitmap image file including an image of the field of view including the known object, extracting features from the bitmap image file, matching the extracted features with features associated with the model of the known object, localizing an object in the bitmap image file based upon the extracted features, clustering the extracted features of the localized object, merging the clustered extracted features, detecting the known object in the field of view based upon a comparison of the merged clustered extracted features and the processor-based model of the known object, and estimating a pose of the detected known object in the field of view based upon the detecting of the known object.
Abstract:
A method for language processing for a vehicle includes receiving an input text. The input text includes a plurality of words. The method also includes determining a rule-based action representation of the input text, parsing the input text to produce a parsed text. The method also includes determining a model-based action representation of the parsed text. The method also includes determining a final action representation of the input text based at least in part on the rule-based action representation and the model-based action representation.