Abstract:
In one approach, a controller computer performs a pre-processing phase involves applying automatic facial recognition, audio recognition, and/or object recognition to frames or static images of a media item to identify actors, music, locations, vehicles, and props or other items that are depicted in the program. The recognized data is used as the basis of queries to one or more data sources to obtain descriptive metadata about people, items, and places that have been recognized in the program. The resulting metadata is stored in a database in association with time point values indicating when the recognized things appeared in the particular program. Thereafter, when an end user plays the same program using a first-screen device, the stored metadata is downloaded to a second-screen device of the end user. When playback reaches the same time point values on the first-screen device, one or more windows, panels or other displays are formed on the second-screen device to display the metadata associated with those time point values.