Siri and the Emergence of the Virtual Personal Assistant

Computing pioneers Vannevar Bush, J.C.R. Licklider, and Doug Engelbart envisioned computers as a way to extend the human mind’s capabilities. Their ideas proposed that by delegating a portion of our tasks to computing systems, we could more effectively manage the increasing complexity of our lives.

In 1997, I attended a brilliant presentation by wearable computing pioneer Dr. Thad Starner that made me aware of how this vision would be realized.  At the time, Thad wore a PC/104 based computer equipped with a “Private Eye” head worn display, a twiddler chorded keyboard, and a CDPD wireless internet connection. With a series of demonstrations, he illustrated the concept of contextually aware computing in which knowledge of location, time, and past user behaviors can be leveraged to better assist a person in completing their tasks. The idea is that through contextual information and a growing body of knowledge of a user’s habits, a computer interface can evolve to fit the user as opposed to the user having to adapt to a static interface. Over time, he described how such an interface could learn enough about an individual to become a “digital doppelganger” which could independently handle a number of one’s routine responsibilities. As an example, he described a scenario in which the time of year is December, and your wearable computer uses its knowledge of your gift buying habits to act on your behalf to complete all of your Christmas shopping for you.

Read the rest of this entry »

The Reality of Augmented Reality

In 2009 augmented reality technology (AR) became mainstream. Though it has been under development for over four decades, in the past year it was prominently featured in major ad campaigns and was on the cover of Esquire. Concurrently, Layar, Wikitude, and a number of AR applications were released for mobile phones.  The future potential of AR has now captured the imagination of both the public and the press. The hype surrounding this technology is similar to the excitement over virtual reality during the 1990’s and 3D online communities, namely Second Life, during this past decade. Unfortunately, in the mind of consumers, neither of these technologies lived up to the hype. Due to a lack of understanding, virtual reality and 3D online communities were unfairly and prematurely dismissed as failures by many. AR is in danger of suffering the same fate. Geoff Northcott described the situation well in his post Augmented Reality, Second Life, and the Trough of Disillusionment.

In an effort to help manage expectations regarding AR technology, I will briefly describe what works today while clarifying what we can expect in the future.

Read the rest of this entry »

Improving Music Search With Machine Learning

Popular music search and discovery systems such as Pandora and Last.fm rely primarily upon human entered annotations to properly classify songs for search retrieval.  Though effective, human centric approaches to music classification are labor intensive and the recommendations that can be generated are limited in scope. For instance, a person must know the name of a particular artist or track in order to receive a recommendation. This situation is not a problem for music fans and aficionados, but it tends to limit the discovery possibilities for casual listeners who may not know a wide variety of artists and track names.

Researchers at the University of California San Diego Computer Audition Lab have developed a system that could address this problem by allowing people to find music using descriptive words rather than artist names and song titles.  For instance, a person could enter the words “high energy guitars” or “romantic vocals” and then receive a list of tracks that match that description.

The USCD system is capable of ingesting songs and automatically tagging them with annotation data without human intervention. To provide accurate results, the system must first be taught to hear music and describe it using natural language.  The training process uses digital signal processing and machine learning algorithms to expose the system to a broad array of music along with the words people use to describe it.  For example, to be able to accurately identify music that is referred to as “driving rock”, the system must analyze a large number of driving rock songs and then identify signal patterns that make that particular style of song unique.

The researchers have been gathering training data through crowdsourcing using an innovative Facebook game called “Herd-It“.  In this game, users are played a song snippet and asked to associate descriptive words and phrases with it.  Users earn points based on how well their answers match those of previous players.  Here’s a video describing the game.

The research group’s latest work in improving automatic music analysis was recently presented at the 2009 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) in the paper “Dynamic Texture Models of Music,” by Luke Barrington, Antoni Chan, and Gert Lanckriet.

With the continuing decline of the radio DJ as taste maker, web based music search and discovery tools will become increasingly important. With further development, machine learning driven music search systems such as this one could provide an intuitive and compelling method for listeners to find music they will enjoy.

Live Concerts in Your Hand: Big Boi and Blink-182 in Augmented Reality

Recently, Doritos began an innovative campaign, Doritos Late Night, in which you can use a webcam and a bag of chips to see a concert appear in your hands. Bags of Doritos have been printed with a computer vision tracking marker which the webcam detects and uses to render a pre-recorded 3D concert. To see the 3D concert, you must first purchase a bag of Doritos printed with the marker. Next, you plug in your webcam, visit doritoslatenight.com and hold the bag in front of the camera. You can choose to see concerts from either Big Boi or Blink-182. The site was developed using the Flash AR Toolkit (FLARToolkit).

Here are video captures of the performances.

This promotion is an excellent example of using innovative technology and engaging content to capture audience attention. Simultaneously, it provides a unique avenue for artists to promote their music and live shows.

Augmented reality (AR) is a technology that has been around for quite a while, primarily in the academic and research domains. Recently, its mainstream presence has increased due to the development of the Flash version of the Augmented Reality Toolkit (FLARToolkit). The ARToolkit was developed by Dr. Hirokazu Kato with Dr. Mark Billinghurst at the University of Washington’s Human Interface Technology Lab (HITLAB) over ten years ago.

In 2000-2001, I led a team which modified the original C based ARToolkit to work on 3D accelerated desktop Windows PC’s. We used the toolkit to develop interactive augmented reality projections for the band Duran Duran’s live concert tour. Here’s a video of ARToolkit effects that we used in the live shows.

The project was documented in this presentation at the 2002 Augmented Reality Toolkit Workshop:

Jarrell Pair, Jeff Wilson, Jeff Chastine, Maribeth Gandy. “The Duran Duran Project: The Augmented Reality Toolkit in Live Performance”. The First IEEE International Augmented Reality Toolkit Workshop, 2002. Download PDF

Quividi: Smart Signage

As advertisers increasingly use digital signage, there will be a demand for detailed audience data akin to what is delivered by web analytics systems. Quividi has developed a camera based solution for measuring impressions, watcher counts, and attention time for ads shown on displays inside stores, on sidewalks, and in other out of home locations. Using facial recognition technology, ads can be targeted to an audience’s gender. Similar advertising technology was depicted in the 2002 science fiction film, Minority Report. Obviously, this product raises significant privacy concerns. Quividi addresses this issue by claiming that no video is ever recorded, only the data derived from the processed footage. Here’s a short piece on Quividi from Advertising Age.

Mobile Personal Broadcasting: Ustream.tv, Qik, Kyte, Flixwagon

In 2007, the phenomenon of Justin.tv thrust the notion of personal live broadcasting into mainstream internet culture.  Anyone with an internet connection and a USB webcam now has a plethora of options for live broadcasting with sites such as Stickam, Justin.tv, and others. In the mobile arena, Qik, Kyte, and Flixwagon have released applications allowing users to live stream from smartphones.  This week, another player emerged with Ustream.tv releasing their mobile broadcasting platform which combines live video streaming with GPS mapping, voting, and live chat.  Currently, the Nokia S60 series phones are the preferred hardware devices since Apple has been reluctant to approve live video streaming applications for the iphone.  However, Qik, Ustream.tv, and Flixwagon do provide applications for jailbroken iphones.

Now that it’s possible, what will be the breakthrough applications for mobile live broadcasting?   I think the answer may lie in looking at the trend of celebrities using Twitter.  Twitter is popular with stars because it is simple and easy to maintain.  It can be almost spontaneously updated unlike a traditional blog or personal website.  Celebrities can easily enhance this fan communication channel using Ustream.tv, Qik, Kyte, or Flixwagon.  In particular, mobile broadcasting could be appealing to touring musicians who rarely have an opportunity to sit at a computer and send a well composed personal blog post. Rapper Soulja Boy has been an early adopter in this area by using Kyte’s mobile platform to keep his fans in the loop.  In a similar fashion, Lil Wayne is set to begin using Ustream.tv’s mobile application.  Mobile broadcasting is clearly a concept to keep an eye on over the next year.

Here’s Ustream.tv’s mobile demo video.

Microsoft Tag: Hyperlinks in the Real World

Microsoft Research has developed an excellent system for creating mobile readable tags which encode URL’s.  Called Microsoft Tag, it can function as an interesting alternative to short codes.  The technology and its applications are explained in this video.

Microsoft Tag has recently become publicly available at http://tag.microsoft.com/.  Visit http://gettag.mobi/ to obtain a reader for most camera equipped smart phones.  Potential applications for media distribution, promotion, and advertising are vast.  For example, a band could enhance their show flyers with tags linking to online ticket offers and guest lists.  A couple of years ago I experimented with Semacode, a similar technology, to develop demos showing how tags could be used to distribute music videos and film trailers.

Demo 1: Music Video Distribution With Visual Tags

Demo 2: Film Trailer Distribution With Visual Tags

Video Ads: The Next Generation

Recently, I have run across two technologies that have potential for giving web video advertisers options beyond the widely used preroll, postroll, and overlay ads. The first is from a company called Innovid which enables the integration of clickable virtual products and ad messages into the content of a video scene.  For example, a jewelry advertiser could insert clickable rings and necklaces on a table. Check out their demo ad gallery here.

The second is Zunavision, a spin off from the Stanford Artificial Intelligence LabZunavision’s technology allows advertisters to place ads onto the sides of buildings, walls, and other surfaces.  I was very impressed by its ease of use.

Zunavision Demo Video

The Purpose of Laboratory4

In the early 1990’s, the term “digital convergence” emerged to refer in part to the imminent integration of networked computing systems and television.  Despite industry and consumer enthusiasm, this process took far longer than many of us hoped.  Legal hurdles, regulatory complexities, technological limitations, and infrastructure issues significantly slowed progress.  Finally in 2009 with the explosive growth of internet delivered video in all its forms, we can argue that the initial stage of digital convergence is complete.

We are now faced with new challenges.  An astounding set of powerful media devices, platforms, and services have become available which present new possibilities for how we interact with media.  At the same time, questions regarding monetization and consumer adoption of these new capabilities remain largely unanswered.

Laboratory4 will discuss and track interactive media technology trends as we travel the road to define the next stage of digital media convergence, one will that will likely focus less on the delivery of media, but more upon how we interact with it.

Topics will span a variety of areas related to advanced interactive media technologies including:

Mobile Platforms and Services: There are well over 3 billion mobile phones available for use in the world today.  Many of them are powerful, internet connected multimedia computers. How can these nearly ubiquitous computing devices be leveraged to create new ways of interacting with and experiencing media?

The New Music Industry: Physical music sales are nearly dead.  An enormous amount of music content can be found free online.  Blogs and social networks are replacing the radio DJ as tastemaker.  What technologies and services will further evolve how music is experienced and created?

The Transformation of Film and Television: The movie and tv industry is going through the same painful process of change that the music industry was thrust into a decade ago.  What role will new technologies play as this business struggles to adapt?

Emerging Technologies: Augmented reality, online virtual worlds, 3D displays, micro projectors, perceptual user interfaces, and other advanced technologies have enormous potential.  What role will they play?

Business Models: How can new and existing technologies and services be packaged to provide sustainable revenue streams?

The journey ahead is complex and risky.  For those of us inspired to embark upon it, opportunities for true innovation await.

Onward,

Jarrell Pair

CTO
LP33.tv


-->