Invented by John D. Lord, Geoffrey B. Rhoads, Tony F. Rodriguez, Digimarc Corp
The Digimarc Corp invention works as followsCell phones, as well as other portable devices, are equipped with various technologies that improve existing functionality and provide new functionality. Some aspects are related to imaging architectures in which the image sensor of a mobile phone is just one stage in a series that acts on data and instructions to capture images. Some aspects are related to the distribution of tasks between the device, and remote resources such as?the cloud’. The cell phone can perform basic image processing such as edge detection and filtering. Some operations are outsourced to remote service providers. Remote service providers can also be identified by using techniques like reverse auctions, where they compete to process tasks. The disclosed technologies also include visual search capabilities and the determination of appropriate actions in response to various image inputs. Others concern the generation, processing and representation of metadata. “A great many other features and arrangements have also been detailed.
Background for Mobile device and method of image frame processing by dedicated and programmable processors that apply different functions frame-by frame
Digimarc’s U.S. Pat. No. No. The derived information is then submitted to a database (e.g. a remote database) which displays the corresponding data. The cell phone displays the information or performs an action in response. This sequence of operations can be referred to by some as “visual search.
Patent publications 20080300011(Digimarc), U.S. Pat. No. No. No. No. 6,491,217 (Philips), 20020152388 (Qualcomm), 200020178410 (Philips) and 20050144455(AirClic), U.S. Patent No. No. 7,251,475 (Sony), U.S. Pat. No. 7,174,293 (Iceberg), U.S. Pat. No. 7,065,559 (Organnon Wireless), U.S. Pat. No. 7,016,532 (Evryx Technologies), U.S. Pat. Nos. Nos. 6,993,573 (Neomedia) and 6,199.048 (Neomedia), U.S. Pat. No. 6,941,275 (Tune Hunter), U.S. Pat. No. 6,788,293 (Silverbrook Research), U.S. Pat. Nos. Nos. 6,766,363 (BarPoint) and 6,675,165 U.S. Pat. No. 6,389,055 (Alcatel-Lucent), U.S. Pat. No. No. 6,121,530 Sonoda) and U.S. Patent. No. 6,002,946 (Reber/Motorola).
The presently-detailed technologies concern improvements to these technologies?moving toward the goal of intuitive computing, devices that can hear and/or see and infer what the user wants in this sensed context.
The present specification details an interrelated collection of work that includes a wide variety of technologies. Some of them include image processing architectures on cell phones, cloud computing, reverse auction based service delivery, metadata processing and image conveyance of semantic information. Each section of the specification details technologies that should be incorporated into other sections. It is therefore difficult to identify “a beginning”. This disclosure is logically supposed to begin at the beginning. “That said, let’s just dive in.
There is currently a large disconnect between the enormous amount of data that is present in the high-quality image data streamed from a mobile phone camera, and the device’s ability to process that data. ?Off device? Processing visual data is a great way to handle the firehose of data. This is especially true when multiple visual processing tasks are desired. These issues are even more important when’real-time object recognition and interactivity’ is considered. The user expects to see augmented reality graphics on the screen of their mobile device as soon as they point the camera at an object or scene.
According to one aspect of current technology, a network of distributed pixel processing engines serves such mobile device users. This meets the most qualitative “human real-time interactivity” requirements. Feedback is usually received in less than a second. Implementation should provide certain basic features for mobile devices, such as a close relationship between the output pixels of the image sensor and the native communication channel. Some basic levels of?content classification and filtering? The local device performs certain levels of basic?content filtering and classification? pixel processing services. The word “session” is the key. The term’session’ also refers to fast responses sent back to mobile devices. Or ‘interactive? A session is essentially a duplex communication that uses packets. Several incoming ‘pixel packets’ and several outgoing packets are sent every second. Every second, several packets of incoming data (which could be updated pixel packets) and several packets of outgoing data may occur.
The spreading out of applications from this starting point is arguably dependent on a core set of plumbing features inherent to mobile cameras. These features (which are not exhaustive) include: a. higher quality pixel acquisition and low-level processing; b. better device CPU and GPU resource for on-device processing of pixels with subsequent feedback from users; c. structured connectivity to?the cloud? In addition, there is a maturing infrastructure for traffic monitoring and billing. FIG. FIG. 1 shows a graphic view of some of the plumbing features that make up what could be described as a visually intelligent system. For clarity, we have not included the usual details of a mobile phone such as the A/D Converter, modulation and remodulation systems (IF stages), cellular transceiver etc.
It is great to have better GPUs and CPUs on mobile devices, but it’s not enough. Cost, weight, and power concerns seem to favor ‘the cloud’. Cost, weight and power considerations seem to favor getting ‘the cloud? “We will do as much of the?intelligence?
Relatively, it appears that there should be an overall set of ‘device-side’ operations. Operations on visual data will be used for all cloud processes. This includes certain formatting, graphic processing and other rote tasks. It seems that there should also be a standard basic header and address scheme for the communication traffic back and forth (typically packetized).
FIG. The list is not exhaustive but it serves as an example of the many visual processing applications available for mobile devices. It is difficult to not see the analogies between the list and how the visual system and brain work. It is an academic field that is well-studied and deals with the question of how “optimized” we are. The eye-retina, optic nerve-cortex system serves an array of cognitive needs very efficiently. This aspect of technology is about how similar efficient and broadly enabling components can be built into mobile phones, mobile device connections, and network services with the goal to serve the applications shown in FIG. This includes applications like FIG. 2, and any new ones that may appear as technology continues to evolve.
Perhaps, the most important difference between mobile devices networks and human analogy revolves around the concept of the marketplace. Where buyers continue to buy more and better products, as long as the businesses are able to make a profit. Any technology that aims to serve applications listed in FIG. It is a given that hundreds, if not thousands, of businesses will work on the details of commercial offerings with the hope of making money in some way. It is true that a few giants will dominate the mobile industry’s main cash flow lines, but it is also true that niche players will continue to develop niche applications and services. This disclosure explains how a market for visual processing services could develop where business interests from across the spectrum can benefit. FIG. FIG.
FIG. The introduction to the technology is a 4 sprint towards the abstract. We find an abstracted information bit derived from a batch of photons which impinged upon some electronic image sensor. A universe of consumers awaits this lowly bit. FIG. The 4A illustrates the intuitively known concept that individual bits of visual data are not worth much unless they’re grouped together in spatial and temporal groups. Modern video compression standards like MPEG7 and H.264 make good use of this core concept.
The ?visual? Certain processing can remove the visual character of bits (consider, for example, the vector string representing eigenface information). We sometimes refer to ‘keyvector data’. (or ?keyvector strings?) to refer collectively to pixel data, together with associated information/derivatives.
FIGS. The key player of this disclosure is also introduced in 4A and 4B: the packaged pixel packet with address labels, which contains keyvector data. Keyvector data can be a patch or collection of patches or a series of patches/collections. Pixel packets can be smaller than a kilobyte or much larger. It could be a small patch of pixels from a large image or a Photosynth of Notre Dame Cathedral.
When pushed around a network, however, it may be broken into smaller portions?as transport layer constraints in a network may require.) It may, however, be divided into smaller pieces when it is actually sent around a LAN, depending on the transport layer constraints of a LAN.
FIG. 5 is a segue diagram still at an abstract level but pointing toward the concrete. A list of user-defined applications, such as illustrated in FIG. 2, will map to a state-of-the-art inventory of pixel processing methods and approaches which can accomplish each and every application. These pixel processing methods break down into common and not-so-common component sub-tasks. Object recognition textbooks are filled with a wide variety of approaches and terminologies which bring a sense of order into what at first glance might appear to be a bewildering array of ?unique requirements? relative to the applications shown in FIG. 2. But FIG. 5 attempts to show that there are indeed a set of common steps and processes shared between visual processing applications. The differently shaded pie slices attempt to illustrate that certain pixel operations are of a specific class and may simply have differences in low level variables or optimizations. The size of the overall pie (thought of in a logarithmic sense, where a pie twice the size of another may represent 10 times more Flops, for example), and the percentage size of the slice, represent degrees of commonality.
FIG. The 6th step is a big one towards concrete. It sacrifices simplicity. We see the top part of this image labeled “Resident Call Up Visual Processing Services”. This top portion is labeled?Resident Call-Up Visual Processing Services,? A mobile device could be aware or even able to perform the actions in FIG. It is not necessary for all applications to be running at all times. A subset of them is therefore “turned on” The idea is that not all of these applications have to be active at all times, and therefore some sub-set of services are actually?turned on? As a once-off configuration, the turned on applications negotiate to identify common component tasks. This is called the “Common Processes sorter?” The first step is to generate an overall list of all the pixel processing functions that are available for device processing. These routines can be selected from a library containing these basic image processing routines, such as FFT, filtering edge detection, resampling color histogramming log-polar transform etc. Generation of corresponding Flow Gate Configuration/Software Programming information follows, which literally loads library elements into properly ordered places in a field programmable gate array set-up, or otherwise configures a suitable processor to perform the required component tasks.Click here to view the patent on Google Patents.