In order for Kismet to properly interact with human beings, it contains input devices that give it auditory, visual, and proprioception abilities. Kismet simulates emotion through various facial expressions, vocalizations, and movement. Facial expressions are created through movements of the ears, eyebrows, eyelids, lips, jaw, and head. The cost of physical materials was an estimated US$25,000.[1]
In addition to the equipment mentioned above, there are four Motorola 68332s, nine 400 MHz PCs, and another 500 MHz PC.[1]
Software system
Kismet's social intelligence software system, or synthetic nervous system (SNS), was designed with human models of intelligent behavior in mind. It contains six subsystems[2] as follows.
Low-level feature extraction system
This system processes raw visual and auditory information from cameras and microphones. Kismet's vision system can perform eye detection, motion detection and, albeit controversial, skin-color detection. Whenever Kismet moves its head, it momentarily disables its motion detection system to avoid detecting self-motion. It also uses its stereo cameras to estimate the distance of an object in its visual field, for example to detect threats—large, close objects with a lot of movement.[3]
Kismet's audio system is mainly tuned towards identifying affect in infant-directed speech. In particular, it can detect five different types of affective speech: approval, prohibition, attention, comfort, and neutral. The affective intent classifier was created as follows. Low-level features such as pitch mean and energy (volume) variance were extracted from samples of recorded speech. The classes of affective intent were then modeled as a gaussian mixture model and trained with these samples using the expectation-maximization algorithm. Classification is done with multiple stages, first classifying an utterance into one of two general groups (e.g. soothing/neutral vs. prohibition/attention/approval) and then doing more detailed classification. This architecture significantly improved performance for hard-to-distinguish classes, like approval ("You're a clever robot") versus attention ("Hey Kismet, over here").[3]
Motivation system
Dr. Breazeal figures her relations with the robot as 'something like an infant-caretaker interaction, where I'm the caretaker essentially, and the robot is like an infant'. The overview sets the human-robot relation within a frame of learning, with Dr. Breazeal providing the scaffolding for Kismet's development. It offers a demonstration of Kismet's capabilities, narrated as emotive facial expressions that communicate the robot's 'motivational state', Dr. Breazeal: "This one is anger (laugh) extreme anger, disgust, excitement, fear, this is happiness, this one is interest, this one is sadness, surprise, this one is tired, and this one is sleep."[4]
At any given moment, Kismet can only be in one emotional state at a time. However, Breazeal states that Kismet is not conscious, so it does not have feelings.[5]
Motor system
Kismet speaks a proto-language with a variety of phonemes, similar to a baby's babbling. It uses the DECtalk voice synthesizer, and changes pitch, timing, articulation, etc. to express various emotions. Intonation is used to vary between question and statement-like utterances. Lip synchronization was important for realism, and the developers used a strategy from animation:[6] "simplicity is the secret to successful lip animation." Thus, they did not try to imitate lip motions perfectly, but instead "create a visual shorthand that passes unchallenged by the viewer."