**Abstract**

In the Turing test, a computer model is deemed to think intelligently if it can generate answers indistinguishable from those of a human. We proposed a Turing-like handshake test for testing motor aspects of machine intelligence. The test is administered through a telerobotic system in which an interrogator holds a robotic stylus and interacts with another party—human, artificial, or a linear combination of the two. Here, we analyze and test experimentally the properties of three versions of the Turing-like handshake test: Pure, Weighted, and Noise. We follow the framework of signal detection theory, and propose a simplified model for the interrogator human-likeness evaluation; we simulate this model and provide an assessment of the statistical power of each version of the handshake test. Our simulation study suggests that the choice of the best test depends on how well the interrogator identifies a human handshake when compared with a model. The Pure test is better for intermediate and large levels of interrogator confusion, and the Weighted and Noise tests are good for low levels of confusion. We then present the results of an experimental study in which we compare among three simple models for a human handshake. We demonstrate that it is possible to distinguish between these handshake models, and discuss the relative advantage of each measure and future possible handshake models and Turing-like tests, in measuring and promoting the design of human-like robots for robotics rehabilitation, teleoperation, and telepresence.

**Introduction**

As long ago as 1950, Turing proposed that the inability of a human interrogator to distinguish between the answers provided by a person and those provided by a computer would indicate that the computer can think intelligently (Turing, 1950). The so-called Turing test has inspired many studies in the artificial intelligence community; however, it is limited to linguistic capabilities. We argue that the ultimate test must also involve motor intelligence (that is, the ability to physically interact with the environment in a human-like fashion, encouraging the design and construction of a humanoid robot with abilities indistinguishable from those of a human being). It was suggested that robots that appear as more human-like may be perceived as more predictable; and thus, people are more likely to feel comfortable while interacting with them (Hinds, Roberts, & Jones, 2004); naturally, when physically interacting with a robot, such human-likeness is even more important. However, an ultimate Turing-like test for motor intelligence involves an enormous repertoire of movements. In this paper, we discuss the methodology of performing a reduced version of the ultimate test, which is based on the one-dimensional handshake test proposed earlier (Karniel, Avraham, Peles, Levy-Tzedek, & Nisky, 2010; Karniel, Nisky, Avraham, Peles, & Levy-Tzedek, 2010). In this reduced version of the Turing-like test for motor intelligence, a model of a human handshake is considered human if it is indistinguishable from a human handshake.

A true understanding of the motor control system could be demonstrated by building a humanoid robot.

The handshake is of interest not merely as a reduced version of the ultimate humanoid test, but also due to its bidirectional nature, in which both sides actively shake hands and explore each other. Motor control research has concentrated on hand movements (Flash & Hogan, 1985; Karniel & Mussa-Ivaldi, 2003; Morasso, 1981; Shadmehr & Mussa-Ivaldi, 1994; Shadmehr & Wise, 2005; Wolpert & Ghahramani, 2000), generating a variety of hypotheses which could be applied to generate a humanoid handshake. In addition, the subjective perception of manual mechanical interaction with the external world was studied extensively (R. Friedman, Hester, Green, & LaMotte, 2008; Jones & Hunter, 1990, 1993; Srinivasan & LaMotte, 1995; Tan, Durlach, Beauregard, & Srinivasan, 1995). Last but not least, the greatest progress in telerobotic (Hannaford, 1989; Niemeyer & Slotine, 2004; Yokokohji & Yoshikawa, 1994) and haptic (Biggs & Srinivasan, 2002; El Saddik, 2007; Okamura, Verner, Reiley, & Mahvash, 2011) technologies involves arm movements. The telerobotic interface is necessary to grant the human–computer discrimination significance, much as the teletype was necessary to hide the computer from the questioning human in the original Turing test.

Handshaking has been discussed in the social context
(Chaplin, Phillips, Brown, Clanton, & Stein, 2000;
Stewart, Dustin, Barrick, & Darnold, 2008), but the development of artificial handshake systems is still in its
infancy (Bailenson & Yee, 2007; Jindai, Watanabe,
Shibata, & Yamamoto, 2006; Kasuga & Hashimoto,
2005; Kunii & Hashimoto, 1995; Miyashita & Ishiguro,
2004; Ouchi & Hashimoto, 1997; Sato, Hashimoto, &
Tsukahara, 2007; Wang, Peer, & Buss, 2009), and state-of-the-art studies mostly explore very basic forms of haptic interaction and collaboration (Bailenson & Yee,
2008; Bailenson, Yee, Brave, Merget, & Koslow, 2007;
Durlach & Slater, 2000; Gentry, Feron, & Murray-Smith, 2005; Groten et al., 2009; Hespanha *et al.*,
2000; J. Kim *et al*., 2004; McLaughlin, Sukhatme, Wei,
Weirong, & Parks, 2003). The proposed Turing-like
handshake test can be useful in identifying the aspects of
the theories that are essential for producing a human-like
handshake movement. In general terms, we assert that a
true understanding of the motor control system could
be demonstrated by building a humanoid robot that
moves and applies forces that are indistinguishable from
a human. Therefore, a measure of our distance from such
a demonstration could be most useful in evaluating current scientific hypotheses and guiding future neuroscience research.

Moreover, understanding the unique properties of healthy hand movement is important for clinical applications. For example, it will allow clinicians to discriminate between unimpaired hand movements and movements that are generated by motor-impaired individuals, such as cerebral palsy patients (Roennqvist & Roesblad, 2007; van der Heide, Fock, Otten, Stremmelaar, & Hadders-Algra, 2005) and Parkinson patients (van Den, 2000), among others. Such automatic discrimination can be useful for diagnosis as well as for assessment of treatment affectivity.

The evaluation of human-likeness of haptic interaction with robotic manipulators has received little, yet growing, attention in recent years. Variable impedance control of a robotic manipulator was compared to constant impedance control in terms of perceived human-likeness (Ikeura, Inooka, & Mizutani, 1999) and human-like movements (Rahman, Ikeura, & Mizutani, 2002). The effect of visual and haptic rendering strategies on plausibility of social interaction was studied in the context of handshaking (Wang, Lu, Peer, & Buss, 2010). A recent study explored the human-likeness of feedforward- and feedback-based models for haptic interaction partners (Feth, Groten, Peer, & Buss, 2011).

In our previous studies, we presented initial exploration of the Turing-like handshake tests (Avraham, Levy-
Tzedek, & Karniel, 2009; Karniel, 2010; Karniel, Nisky, *et al*., 2010), and proposed three different methodologies for assessing the human likeness of a handshake
model (Karniel, Avraham, *et al.*, 2010): (1) a Pure test,
(2) a Weighted test, and (3) a Noise test. The methodologies are similar in the general experimental architecture:
a human interrogator shakes hands with another human,
with computer software, or with a combination of the
two. The handshake is performed through a telerobotic
system, as depicted in Figure 1. However, the three versions of the test are conceptually different: in the Pure
test, handshake models are compared against human
handshakes; in the Weighted test, combinations of a
model and human handshake with different weights are
compared; in the Noise test, models are compared
against human handshakes combined with noise.

In the current paper, we set out to explore the differences between these three versions of the Turing-like
handshake test in a simulation study based on the preliminaries from signal detection theory (SDT). To further explore the methodological differences between
these three versions, we present an experimental study
that uses all three methods to compare between three
simple models for a human handshake. The main contribution of this work is methodological, and hence, we
chose very simple and primitive models for a human handshake, and did not incorporate into the models any
of the abundant recent findings in human motor control.

We begin the paper with a brief introduction to SDT
and psychometric function evaluation. We
then describe the three proposed versions of the Turing-
like handshake test; present our simulation
study; and the experimental comparison of
three models for a human handshake using all three
Turing-like tests. We conclude the paper
with a discussion of the simulated and experimental
results. Part of the content has
been also reported in Karniel, Avraham *et al*. (2010).
However, here we add more definitions and assumptions
required for the simulations in the following sections;
moreover, the analysis and simulations reported are unique to this paper, and were only
partly presented in an abstract form (Avraham, Nisky, &
Karniel, 2011).

**Preliminaries in Psychophysics—Signal
Detection Theory and the Psychometric
Function **

SDT is a mathematical framework for analyzing human decision-making in perceptual and cognitive tasks, that makes the role of decision processes explicit (Abdi, 2007b; Gescheider, 1985; Lu & Dosher, 2008; MacMillan, 2002). In particular, the theory provides computational tools for estimating the sensitivity and response bias of the participant in the task. In the original notion of SDT, the task is to categorize ambiguous stimuli which can be generated either by a known process (signal) or be obtained by chance (noise); namely, a yes–no task. In another paradigm, the two-alternative forced choice (2AFC), the task is to sort two different stimuli into categories. In the current paper, we use the 2AFC paradigm in which the two stimuli in each trial are two different handshakes, and the categories are ‘‘more human-like’’ and ‘‘less human-like.’’

According to SDT, the response of the participant
depends upon the intensity of a hidden variable—a continuously variable internal representation—and the participant makes the decision based on some objective criterion with regard to this representation. In the 2AFC
paradigm, in each trial, the participant compares the
magnitudes of the internal representations of both stimuli, and chooses the stimulus that generates the greater
internal response to belong to a category with a higher
expected internal response. Importantly, the scale of internal representations is arbitrary, and does not necessarily correspond to some physical property of the stimulus.
Errors arise when the distributions of the categories
overlap, and the proportion of the correct responses is
used to estimate the extent of overlap between the internal representations of the different categories. In our
case, the hidden variable is internal representation of
human-likeness of a handshake, and we will designate it
as *h* in the remainder of the paper. SDT, as a theoretical
framework, does not specify the distribution of the internal representation; however, in most applications, the
distributions of the representations are assumed to be
Gaussian, and often, the variances of the different categories are assumed to be equal. We follow these common assumptions, and assume that *h*~ *N* (μ, σ); namely, the probability density function of the internal
representation of human-likeness of a handshake is:

where μ is the expected human-likeness of a specific
handshake on an arbitrary scale, and σ is the variance of
the internal representation.

A tightly related method of quantifying performance
in psychophysical task is the psychometric curve. A common experimental design for fitting such a curve is the
method of constant stimuli: the task is similar to the classic description of SDT, but a standard stimulus (usually
drawn from the middle of stimuli range) is presented in
each trial together with one of *n*_{5} comparison stimuli.
The participant labels each as larger or smaller than the
standard, and the function that quantifies the probability
of answering ‘‘comparison stimulus was larger’’ is the
psychometric function. The presence of standards does
not temper the analysis because it gives no information
regarding which response is appropriate (MacMillan,
2002).

The general form of the psychometric function is:

where *x* is the physical property of the stimulus, and the
shape of the curve is determined by the parameters (η, ξ, λ, ϒ) and the choice of a two-parameter function *F*, typically, a sigmoid function (Wichmann & Hill, 2001). The
rates of lapse are ϒ and λ—incorrect response regardless
of the difference between stimuli intensity, and η and ξ
determine the shift and the slope of the sigmoid function, respectively. The choice of a specific function is
determined by an assumption about the distribution of
the internal representation, by how its parameters
change with stimulus intensity, and by what the decision
rule is (Garcia-Perez & Alcala-Quintana, 2011). Under
the assumption in Equation 1, the psychometric function will have a logistic form, namely,

The 0.5 probability point is interpreted as the point of
subjective equality (PSE)—the comparison stimulus intensity that is perceived equal to the standard—a measure of bias, and it is estimated as the inverse of the sigmoid function at the 0.5 threshold, *F*^{-1} (0.5). When the
assumption in Equation 1 is not reasonable, it is still
possible to estimate the PSE correctly by fitting other sigmoid functions, or using distribution-free methods,
for example, Spearman-Kraber (Miller & Ulrich, 2001).

**Three Turing-like Tests—Methods to
Model Human-Likeness Grade
Assessment **

Following the original concept of the classical
Turing test, each experiment consists of three entities:
human, computer, and interrogator. Two volunteers
participate in the experiment: human and interrogator.
Throughout the test, each of the participants holds the
stylus of one haptic device and generates handshake
movements (see Figure 1). In all three methods (Pure,
Weighted, and Noise), each trial consists of two handshakes, and the interrogator is asked to compare between
the handshakes and answer which of them feels more
human. Based on the answers of the interrogator, we calculate a quantitative measure for the human-likeness of a
handshake model, the model human-likeness grade
(MHLG). This grade quantifies the human likeness on a
scale between 0 and 1. The computer is a simulated
handshake model that generates a force signal as a function of time and the 1D position of the interrogator’s
haptic device (*x*_{inter} (*t*)) and its derivatives:

where Φ[*x*_{inter} (*t*), *t*] stands for any causal operator, and
*T* is the duration of the handshake.

The force feedback to the human is generated purely by the interrogator to preserve, as much as possible, the natural characteristics of the human handshake movement. The nature of the force applied on the interrogator is the key difference between the three methods that are discussed in this paper. In general, it is either pure human, pure computer, or a combined human and computer handshake (see Figure 2).

*Pure Turing-like Test *

The Pure Turing-like test is the most similar to the original notion of the Turing test for intelligence. In each trial, the interrogator is presented with a pure computer handshake, Figure 2(a), and a pure human handshake, Figure 2(b). Namely, the force that is transmitted to the interrogator is

where *F*_{model} is defined in Equation 3, and *F*_{human} is a
function of the real-time position and/or force applied
by the human and its derivatives, and its exact implementation depends on the teleoperation architecture. If we
wish to compare *n _{m}* models in a single experiment, each
block consists of

*n*trials, one trial per model. Each experiment consists of

_{m}*N*blocks, such that each computer handshake is repeated

*N*times. The main purpose of the current study is to compare different methods for human-likeness assessment, and therefore, we choose

*N*such that the total number of trials will be identical between different methods. In general, the choice of

*N*affects the statistical power of the test, and should be determined according to the desired significance and power according to pilot studies. An analysis of statistical power of this test and how it is related to the number of blocks is presented at the end of this section.

For each model, we calculate the proportion of handshakes in which the interrogator chooses the computer
handshake (*m*) over the human handshake (*h)* as more
human-like, *p _{m}*

_{>}

*. This is an estimate of the probability of the interrogator to decide that the modeled handshake is more human than the human handshake. We follow SDT, and assume that after a completion of the probing phase of trial*

_{h}*i*, internal representations of human-likeness are formed for each of the individual handshakes,

*h*and

_{i}^{m}*h*(where the superscripts

_{i}^{h}*m*and

*h*indicate model and human handshakes, respectively). With this formulation,

*p*

_{m}_{>}

*is an estimate of the probability*

_{h}*p*(

*h*>

_{i}^{m}*h*). When the model is indistinguishable from a human,

_{i}^{h}*E*(

*p*

_{m}_{>}

*=*

_{h})*p*(

*h*>

_{i}^{m}*h*) = 0:5. We calculate the MHLG of the Pure test (MHLG

_{i}^{h}_{p}) by multiplying this estimation by two, such that MHLG

_{p}= 0 means that the model is clearly non-human-like, and HMLGp = 1 means that the tested model is indistinguishable from the human handshake:

Since the human handshake is the most human by definition, MHLG_{p} is cut off at 1.

Intuitively, when the interrogator is very good at identifying the human handshake when compared with any computer handshake, this test will be ineffective in grading the different models relative to each other. This is because SDT is based on the assumption that mistakes are happening, and is not applicable otherwise. Therefore, we designed two additional versions of the Turing-like test. In both versions, the main idea is to present the interrogator with handshakes that are a mixture of human-generated and computer-generated handshakes, as shown in Figure 2(c–d). This increases the level of confusion of the interrogator, and allows an effective comparison between different models, even if each of the models, by itself, is quite far from being human.

*Weighted Turing-like Test *

In the Weighted Turing-like test, the interrogator is always presented with a combination of a human and a computer handshake, as shown in Figure 2(c). Namely, the force that is transmitted to the interrogator is:

where *F*_{model} and *F*_{human} are defined similarly to the definition after Equation 4. The exact value of α is determined according to a predefined experimental protocol.
As in the Pure test, a single trial consists of two handshakes. In each trial, in one of the handshakes—the
comparison stimulus—the interrogator interacts with a
combination of forces according to Equation 6 with α = α_{comparison}, where α_{comparison} is one of *n*_{5} equally
distributed values from 0 to 1, for example,
α_{comparison }= {0, 0.2, 0.4, 0.6, 0.8, 1}. The other
handshake—the standard stimulus—is also a combination of forces, but with a fixed weight α = α_{standard} = 0.5,
generated online from the human and a reference model,
which is chosen before the experiment.

In each experiment, we assess the human-likeness of
*n _{m}* - 1 test models and a reference model. In the trials
in which we assess the human-likeness of the reference
model, the interrogator is presented with the same
model in both handshakes, but with different weights,
and thus, these trials serve as the control. Each experimental block consists of

*n*

_{m}n_{5}trials comprising each of the

*n*

_{5}linear combinations of the model and the human for each of the

*n*model comparisons.

_{m}The order of the trials within each block is random
and predetermined. Each experiment consists of *n _{b}*
blocks, such that each combination is repeated

*n*times. We choose the number of blocks

_{b}*n*, such that, similarly to the Pure test, each stimulus model is presented to the subject in

_{b}*N*=

*n*

_{b}n_{5}trials.

The next step in assessment of the human-likeness of
each model is to fit a psychometric curve, Equation 2, to
the probability of the interrogator to answer that a com-
parison handshake is more human-like as a function of
x = α_{comparison} - α_{standard} (see Figure 11 for an example of psychometric curves
derived from experimental data). We assume that a higher weight of a human handshake component in a
combined handshake yields a higher probability to
choose a handshake as more human-like. Thus, this
probability approaches one as the difference
α_{comparison} - α_{standard} > 0 becomes larger, and zero
for a larger difference in the opposite direction,
α_{comparison} - α_{standard} < 0. This is a necessary assumption
for the Weighted test, and hence, should be validated
for each class of new models that are tested with it. This assumption was validated in our previous studies
(Avraham *et al.*, 2009; Karniel, 2010; Karniel, Nisky
*et al*., 2010) as well as in the experimental study of the
current paper in the "Experiment" section. However, in the general case,
there might be models for a human handshake that feels
human-like when presented alone, but will do poorly
when combined with a human handshake, and vice versa.
In these cases, the Weighted method should not be used.

The PSE indicates the difference between the
α_{comparison} and α_{standard} for which the handshakes are perceived to be equally human-like. We use the estimated
PSE for calculating MHLG_{w} according to:

Thus, a model that is perceived to be as human-like as
the reference model yields the MHLG_{w} value of 0.5.
The models that are perceived as least or most human-like yield possible MHLG_{w} values of 0 or 1, respectively.
Therefore, MHLG_{w} is cut off at 0 and 1.

The Weighted test is highly dependent on the successful fitting of the psychometric function. In Wichmann and Hill (2001), it was shown that the fitting process is most effective when the stimulus intensities are distributed symmetrically around the PSE, at low as well as high performance values. Therefore, the Weighted method will be most effective for a reference model that is similar or slightly less human than the tested models.

*Noise Turing-like Test *

The main methodological concern in using the Weighted test is the fact that it is not necessary that the model that is perceived to be most human-like when combined with a human handshake is actually most human-like when presented alone. Therefore, we designed a third method for the assessment of computer models’ human-likeness. In the Noise Turing-like method, the interrogator is presented with either a computer handshake, as in Figure 2(a), or a human handshake combined with a noise, as in Figure 2(d). This noise is chosen such that the resultant handshake is perceived to be the least human-like possible, and such that the human handshake is perceived as less human as the weight of noise increases. This allows for an effective comparison of a pure model handshake with a human handshake corrupted by different levels of noise. The idea is that if more noise is required for degrading the human handshake such that it will be indistinguishable from the model, then the model is less human-like. Such an approach was suggested in the context of measuring presence in virtual environments according to the amount of noise required to degrade the real and virtual stimulation until the perceived environments are indiscriminable (Sheridan, 1994, 1996).

The protocol of the Noise Turing-like handshake test
is similar to the Weighted protocol. In each trial, the
interrogator is presented with two handshakes. In one of
the handshakes—the standard stimulus—the interrogator interacts with a computer handshake model. The
other handshake—the comparison stimulus—is a handshake that is generated from a combination of the
human handshake and noise. In the current study, we
chose the noise as a mixture of sinus functions with frequencies above the natural bandwidth of the human
handshake (Avraham *et al*., 2009; Avraham, Levy-Tzedek, Peles, Bar-Haim, & Karniel, 2010; Karniel,
2010; Karniel, Nisky *et al*., 2010), but the general
framework is flexible, and any other function can be used
instead, as long as it is indeed not human-like. The comparison handshake force is calculated according to:

where β is one of *n*_{5} equally distributed values from 0 to
1, for example, β = {0, 0.2, 0.4, 0.6, 0.8, 1}. Similar to
the previous tests, at the end of each trial, the interrogator is requested to choose the handshake that felt more
human-like.

The rest of the methodological details, including fitting of the psychometric function, are similar to the
Weighted test. However, the psychometric curve is now fitted to the probability to choose the standard handshake as more human-like as a function of β, the relative
weight of noise in the comparison handshake. Namely,
for a model that is indistinguishable from human, the
expected PSE is 0. For a model that is as human-like as
the noise function (hence, the least human-like model),
the expected PSE is 1. Therefore, the MHLG_{n} is calculated according to:

Thus, models that are perceived as the least or the most
human-like possible yield MHLG_{n} values of 0 or 1,
respectively, and the estimations of MHLG_{n} are cut off
at 0 and 1.

**Simulation **

Intuitively, by design, the different methods that are described in the previous section are expected to be useful for different levels of sensitivity of the interrogator to the difference between human-generated and computer-generated handshakes. In the current section, we set out to quantify the difference between the approaches in terms of statistical power of each method under various conditions.

*Methods *

In order to build a simulation of different psychophysical approaches, we must make an assumption about the decision process underlying the answers of the interrogator. In the current work, we do not explore the decision process, and therefore, we make assumptions that will allow us to explore the different experimental methodologies. The guiding principle behind our assumptions is maximal simplicity. Therefore, we do not simulate the actual handshake, and instead we simulate a simplified decision process.

We follow the general framework of SDT, and assume
that after a completion of trial *i*, an internal representation of human-likeness (*h _{i}*) is formed, and we assume
Gaussian distribution for this internal representation, as
specified in Equation 1. We also assume that for all computer, human, and combined handshakes, this distribution has identical standard deviation, but different means, and that these means are consistent across subjects. The mean of this distribution is the actual human-likeness of the handshake and the variance represents the
confusion level of the interrogator, namely, decision variance (see Figure 3). We simulate the decision of the
interrogator by choosing a random variable

*h*from the appropriate distribution for each of the handshakes in a single trial, and answering according to the magnitude of

_{i}*h*. The process is repeated for a number of trials, and the appropriate MHLG is calculated according to the simulated answers of the interrogator. We tested five models with μ

_{i}*= {50, 60, 70, 80, 100} means of internal representation of human-likeness of the models, and compared them to a completely human handshake, for which the mean of the internal representation μ*

_{m}_{human}=100. This simulation was repeated for different decision standard deviation values, σ = {1, 10, 30, 50, 70, 90}.

We repeated the process 10 times to estimate the variability of MHLG for different repetitions of the simulation, so as to simulate repetition of the experiment with
different subjects. This procedure also allowed us to perform a one-sided, paired *t*-test, and determine whether
the MHLG of two simulated models are statistically significantly (*p* < 0.05) different.

*Pure Test Simulation *

For each model, we
repeated 80 trials where a single sample from a random
variable *h ^{m}* ~Ν(μ

*σ) was compared to a sample from*

_{m}*h*~

^{h}*Ν*(μ

_{human}σ). We calculated

*p*, the proportion of trials in which

_{m>h}*h*>

_{i}^{m}*h*, and used it to calculate MHLG

_{i}^{h}_{p}according to Equation 5.

*Weighted Test Simulation *

The Weighted Turing-like test is based on the assumption that a higher weight of the human handshake component in a combined handshake yields a higher probability of choosing a handshake as more human-like. We incorporated this assumption into the simulation by choosing the mean value for human-likeness of a combined handshake as:

without changing the standard deviation of the decision
variable. We chose the least human-like model, μ* _{m}* = 40,
as a reference model, and each of the tested models, μ

*= {50, 60, 70, 80, 100}, as comparison models, and performed simulation of 10 blocks per interrogator. Within each block, for each model, α*

_{m}_{comparison}was assigned with eight equally distributed values from 0 to 1: α

_{comprison}= {0, 0.142, 0.284, 0.426, 0.568, 0.710, 0.852, 1}, and α

_{standard}= 0.5. As in the Pure test simulation, each trial was simulated such that a single sample from a random variable

*h*

_{comparison}~

*Ν*(μ

_{combined}(α

_{comparison})σ, ) was compared to a sample from

*h*

_{standard}~ (μ

_{combined}(α

_{standard}), σ). Note that 10 blocks of eight levels of α

_{comparison}yield a total of 80 trials per model, similar to the Pure test. This is important for comparability of the methods. At the end of the simulation, for each level of α

_{comparison}, we calculated

*p*(α

_{c>s}_{comparison}), the proportion of trials in which

*h*

_{comparison}>

*h*

_{standard}for that particular level of α

_{comparison}. We used the Psignifit toolbox version 2.5.6 for MATLAB

^{1}to fit a logistic psychometric function (Wichmann & Hill, 2001) to the simulated answers of the interrogator and extract the PSE, and used it to calculate MHLG

_{w}according to Equation 7. In the special case when

*p*(α

_{c>s}_{comparison}) > 0.5 for all α

_{comparison}, the fitting of the psychometric function is not reliable. However, since this only occurs for models that are very human-like when compared with the reference handshake, we set MHLG

_{w}= 1 in these cases.

*Noise Test Simulation *

The simulation of
the Noise test was similar to the Weighted test, with a
few differences. We assumed that the noise that we add
to the human handshake is equivalent to combining the
human handshake with the least human-like model possible, namely, μ_{noise} = 40, and therefore:

Within each block, for each model, β was assigned with
eight equally distributed values from 0 to 1: β =
{0, 0.142, 0.284, 0.426, 0.568, 0.710, 0.852, 1}.

As in the previous simulations, each trial was simulated
such that a single sample from a random variable
*h*_{standard} ~ *Ν*(μ* _{m}*, σ) was compared to a sample from

*h*

_{comparison}~ (μ

_{combined}(β), σ). At the end of the simulation, for each level of β, we calculated

*p*(β), the proportion of trials in which

_{s>c}*h*

_{standard}>

*h*

_{comparison}. We extracted the PSE from a psychometric function and calculated MHLG

_{n}according to Equation 8.

*Statistical Power Analysis *

To compare
the performance of each of the tests for different levels of Δμ and different levels of confusion of the interrogator,
we performed a systematic statistical power and size of
effect analysis by means of Monte Carlo simulations
(Abdi, 2007a). For each of the tests, we repeated 100
simulations, in which we repeated five calculations of the
MHLG for each of the models μ* _{m}* = {50, 52, 54, ... ,
96, 98, 100}. Then, we performed a one-sided paired

*t*-test between the MHLG of the worst model (μ

*= 50) and each of the other models. This choice of particular comparisons was arbitrary, and actually, once an MHLG for each model was calculated, any pair of models could be compared. The power of each Turing-like test is the proportion of the simulations in which the difference in MHLG was significant at the .05 significance level, and the size of the effect is the mean difference between the MHLGs that were compared. In the current paper, we state that a test has sufficient statistical power when this proportion is 0.8 (Cohen, 1988, 1992). Each of these procedures was repeated for different levels of standard deviation of*

_{m}*h*, σ = {1, 4, 7, 10, ... , 97, 100}.

_{i}Next, we used a similar procedure to assess the power of the different tests in detection of difference between the human-likeness of very similar models, Δμ = 6, a difference which is small enough when compared with mean values of 50–100, but large enough to be a significant difference for the smallest level of interrogator confusion. Here, instead of comparing all models to the least human-like model, we compared models with similar levels of human-likeness. The idea here was to explore whether the performance of the test depends on how human-like are these two similar models; namely, whether there is expected to be a difference in performance between comparing two very human-like models and comparing two very not human-like models.

In the last part of the simulation, we wished to explore the sensitivity of our approach to the number of handshakes in each experiment. We repeated the analysis of the Pure Turing-like test for different number of blocks, 10, 20, 40, 60, 80, 100, and 200.

*Results *

The results of the simulations of all three tests are depicted in Figure 4. In the left panels of the figure, the results of one repetition of the simulation, and in the right panels the mean of 10 MHLG from repetitions of the simulation are presented together with 95 percent confidence intervals of the estimation of mean. Successful discrimination between the different models yields a statistically significant increase of the MHLG as the actual model human-likeness increases. The results suggest that the Pure Turing-like test is successful for intermediate and large levels of decision variance of the interrogator, and completely useless for low levels of variance. This is not the case for the Weighted and Noise tests, which are best for a low level of variance in the decision, and become less sensitive with increasing decision variance.

Examining the right panels of Figure 4 leads to the prediction that increasing the number of subjects is expected to increase the sensitivity of almost all tests, with the exception of the Pure test at the lowest levels of decision variance.

*Statistical Power Analysis for
Comparison Between the Turing-Like Tests *

The results of the power and size of effect analysis for comparison between the least human-like model and all other models are depicted in Figures 5 and 6, and support the qualitative observations from the previous paragraph. The results of power analysis for comparisons of similar models (Δμ = 6) for models with different levels of human-likeness are depicted in Figure 7.

The Pure test has zero power for very small decision
variance, as shown in Figure 5(a) left. This is due to the
lack of overlap between the distributions of the internal
representations of human-likeness when the decision
variance is small. As the confusion of the interrogator
starts to increase, the power increases for large differences in human-likeness. The test is best for intermediate
levels of decision variance, σ ≈ 20; for these and larger
values, the Pure test has sufficiently high power starting
from Δμ > σ/3 (see Figure 6). Importantly, examining
the right part of Figure 5(a) suggests that the difference
in MHLG_{p} values is a monotonically increasing function
of the difference between the hidden human-likeness
levels of each model. In an analysis of comparison
between similar models, Figure 7(a) reveals that the Pure
test is sensitive to difference between similar very human-like models (μ > 80) when the decision variance
is intermediate, namely 5 < σ < 20. In general, these
observations are in accordance with the following intuitive reasoning: When comparing two models to each
other, the Pure Turing-like test is effective if at least one
of the models is human-like enough such that there is
some overlap between the distribution of internal representation of human-likeness, and the interrogator will
make enough mistakes when asked to compare between
the human and computer handshakes. However, if both
models are very human-like, it will be difficult to distinguish between them when the decision variance is large.

The Weighted test has high statistical power for the
smallest level of interrogator decision variance, as shown
in the left part of Figure 5(b), and in Figure 6. As the decision variance increases, the test loses statistical power,
until it becomes not sensitive enough (power < 0.8) for
σ > 25. Examining the size of effect analysis, as shown
in the right side of Figure 5(b), reveals that this happens
since the difference in the mean value of MHLG
decreases. In addition, the difference in MHLG_{w} values
is a monotonically increasing function of the difference
between the hidden human-likeness levels of each model
only in the range of interrogator decision variances
where the statistical power is high. This indicates a
potential caveat in the use of the Weighted test; however, since this only happens when the statistical power is
very low, it does not impose actual limitations. Namely, the Weighted test will not be used in this case, both
because of the lack of statistical power and because of inaccuracy. In an analysis of the comparison between similar
models, Figure 7(b) reveals that the Weighted test is sensitive to difference between similar models that are not very
human-like (μ < 75) for very low levels of decision variance σ < 10. These results are in accordance with the following intuitive reasoning: If both compared models are
more human-like than the reference model handshake
combined with the human handshake, they are both estimated as maximally human (MHLG = 1), and, therefore,
there is no statistically significant difference between them.

The Noise test, similar to the Weighted test, has high
statistical power for the smallest level of interrogator decision variance, as shown in the left part of Figure 5(c).
As the confusion level increases, the power is still high
for Δμ > σ/2, as shown in Figure 6. Examining the right
panel of Figure 5(c) reveals that similarly to the Pure
test, the difference in MHLG_{n} values is a monotonically
increasing function of the difference between the hidden
human-likeness levels of each model. In addition, up to
σ = 40, this function does not change with interrogator
confusion level, which suggests more consistent results
between interrogators with different confusion levels. In
an analysis of the comparison between similar models, Figure 7(c) reveals that the Noise test is sensitive to a difference between similar models regardless of their level
of human-likeness for low levels of interrogator decision
variance, namely σ < 15.

To summarize, for very low levels of decision variance it
is appropriate to use either the Weighted or Noise
Turing-like tests. Starting from σ = 20, the Pure test outperforms the other tests. For very similar models, when
the decision variance is low, the Noise test is appropriate
for all levels of human-likeness, and the Weighted test is
appropriate only for not very human-like models. For intermediate levels of decision variance, the Pure tests
should be used, but it is likely to distinguish only between
similar very human like-models. For large levels of decision variance, none of the tests has enough statistical
power to be able to make statements about the difference
in human-likeness between very similar models.

*The Effect of Number of Handshakes in
an Experiment *

The power of any statistical analysis increases with increasing sample size. This happens since the uncertainty in any estimation is reduced when more data are sampled. Our MHLG estimation is not an exception to this rule. Indeed, analysis of the power of the Pure test, as shown in Figure 8, reveals that using more handshakes in the assessment of MHLG yields smaller confidence intervals for the estimated MHLG, as shown in Figure 8(a), and an increase in the statistical power, as shown in Figures 8(b) and 8(c). The increase in the power is due to the decrease in estimation uncertainty (not to be confused with the interrogator’s decision variance), and not in the size of the mean difference in MHLG, which is similar to the right panel of Figure 5(a) regardless of number of blocks. Importantly, we conclude from Figure 8(c) that the increase of power is not very high for more than 80 handshakes, and, therefore, we chose 80 handshakes per model in our experimental studies that are described in the next section.

**Experiment **

In the current section, we present our experimental comparison of the three Turing-like tests while trying to assess the human-likeness of three simple models for a human handshake.

*Models for Human Handshake *

A computer model of a human handshake is a force
signal as a function of time, 1D position of the interroga-
tor’s hand, *x*_{inter}(*t*), and its derivatives. In the most general notation, such a function is presented in Equation 3.
In our experimental study, we compared three simple
versions of such a function, which are depicted schematically in Figure 9. We considered three candidate models,
the linear spring, the linear spring and damper, and the
mixture of sinusoids.

1. Linear spring, K = 50 N/m, namely:

This model was chosen because it describes a very simple function between the movement of the interrogator and the force applied by the model that creates a form of interaction.

2. Linear spring and damper connected in parallel,
*K* = 20 N/m, *B* = 1.3 Ns/m, namely:

This model was chosen to represent the passive
mechanical characteristics of human movement.
It has an additional parameter when compared
with the previous model, and therefore, it is
expected to be ranked higher on the MHLG
scale.

3. Mixture of sinusoids with frequencies within the
typical range of human movement, between 1.5
and 2.5 Hz (Avraham *et al*., 2009, 2010; Karniel,
2010; Karniel, Nisky *et al*., 2010), namely:

where *U(a, b*) is a uniform distribution between *a* and *b*, with the probability density function

This model was chosen to represent the class of
active handshakes, where the force applied on the
interrogator is actually not a function of his or
her movements.

*Methods *

*Experimental Procedure, Apparatus,
and Architecture *

Thirty volunteers participated in the
experiments after signing the informed consent form as
stipulated by the local Helsinki Committee. In each
experiment, two naive participants—human and interrogator—held the stylus of a PHANToM Desktop haptic device (SensAble Technologies) and generated handshake movements, as depicted in Figure 1. Throughout
the experiment, the interrogator was requested to answer which of the two handshakes within a single trial
felt more human by pressing the appropriate key on the
keyboard. Both haptic devices were connected to a Dell
precision 450 computer with dual CPU, Intel Xeon 2.4
GHz processor. The position of the interrogator,
*x*_{inter}(*t*), and of the human, *x*_{human}(*t*), along the vertical
direction, were recorded at a sampling rate of 600 Hz.
These position signals were used to calculate the forces
that were applied to each of the devices according to the
overall system architecture that is depicted in Figure 10.
These forces were interpolated online and applied at
1000 Hz. The human always felt force that is proportional to the difference between the positions of the
interrogator and the human himself, namely:

where *K _{t}* = 150

*N/m*. The interrogator felt a force that is a combination of this force, a computer-generated function, and damping, namely:

where *G _{h}* and

*G*are the gains of the human and computer force functions, respectively, the computer generated force function

_{c}*f*

_{computer}(

*t*) is either a handshake model or noise,

*K*= 150 N/m is the gain of the position teleoperation channel, and

_{t}*B*1⁄4 2 Ns/m is damping that is added at the interrogator side to ensure overall system stability. The gains and the computer-generated function were determined according to the exact protocols that are specified further.

_{t}The experiments were divided into two sessions that
were performed in two visits to the lab on different days.
Each session started with practice of 60 handshakes in
which the interrogator shook hands with the human
through the telerobotic system, namely *G _{h}* = 1 and

*G*= 0. The purpose of these practice trials was to enable the participants to be acquainted with a human handshake in our system.

_{c}*Experiment 1: Pure *

Five pairs of volunteers participated in the experiment. Each experimental block consisted of three trials in
which we compared each of the three candidate models
to a human handshake. In each trial, the interrogator was
presented with a pure computer handshake, namely *G _{h}* =
0 and

*G*= 1, and pure human handshake, namely

_{c}*G*= 1 and

_{h}*G*= 0. The computer-generated force function was calculated according to one of the models, Equations 12, 13, or 14. Each block consisted of three trials such that each model was presented once. The order of the trials within each block was random and predetermined. Following our simulation, each experiment consisted of 80 test blocks. Two blocks were added at the beginning of the experiment for general acquaintance with the system and the task. The answers of the subjects in these blocks were not analyzed. In order to preserve the memory of the feeling of a human handshake in the telerobotic setup, after each group of nine experimental blocks (27 trials), the subject was presented with six human handshakes. To increase the motivation of the participants, they received a grade at the end of each block that was calculated based on their answers in the block.

_{c}After completion of both sessions, we calculated
for each of the models the MHLG_{p} according to
Equation 5.

*Experiment 2: Weighted *

Five pairs of volunteers participated in the experiment. In each trial the interrogator was presented with
two different combinations of a human handshake and a
model, a standard and a comparison handshake. The
force that was applied on the interrogator was calculated
according to Equation 15 with *G _{h}* = α and

*G*= 1-α. The values of α were α = α

_{c}_{comparison}and α = α

_{standard}for the comparison and standard handshakes, respectively. The model in the standard handshake was always the linear spring, Equation 12, and the model in the comparison handshake was one of our three candidate models, Equations 12, 13, or 14.

Each experimental block consisted of 24 trials comprising each of the eight linear combinations of the stimulus and the human, Equation 6 with α = α_{comparison}, for each of the three models. The order of the trials within
each block was random and predetermined. Each experiment consisted of 10 blocks. Thus, each of the models
was presented to the interrogator in 80 comparison
handshakes. We added one practice block, namely 24 trials, for general acquaintance with the system and the
task. The answers of the interrogator in this block were
not analyzed. In order to preserve the memory of the
feeling of a human handshake in the telerobotic setup,
after each experimental block (24 trials), the subject was
presented with six human handshakes. To increase the
motivation of the participants, at the end of each block,
they received a grade that was calculated based on their
answers in the trials where the linear spring model was
presented both in comparison and standard handshakes.
In these trials, one of the handshakes is always composed
of a greater weight of human forces than the other handshake. We assume that a handshake with larger weight of
human versus computer handshake is perceived as more
human, and therefore, if the same model appears in both
handshakes with different weights, the participant should
be able identify the handshake that is more similar to
that of a human.

After completion of both sessions, we used the Psignifit toolbox version 2.5.6 for MATLAB to fit a logistic psychometric function (Wichmann & Hill, 2001) to the answers of the interrogator and extract the PSE. We then calculated the MHLGw of each of the models according to Equation 7.

*Experiment 3: Noise *

Five pairs of volunteers participated in the experiment. In each trial, the interrogator was presented with a
pure computer handshake, namely *G _{h}* = 0 and

*G*= 1, which was one of the three candidate models, Equations 12, 13, or 14, and a human handshake combined with noise, namely

_{c}*G*=1 - β and

_{h}*G*= β. The values of β were determined according to Equation 8. The noise function was calculated according to:

_{c}Note that the model for noise is similar to our mixture
of sinusoids model, but the random frequencies were chosen above the typical bandwidth for human movements, between 2.5 and 3.5 Hz (Avraham *et al*., 2009,
2010; Karniel, 2010; Karniel, Nisky *et al*., 2010). In
addition, we used a mixture of five rather than three sinusoids to ensure that the resultant signal would be perceived as noise by human subjects. We chose the amplitude of the sinusoids in the noise function such that the
power of the noise signal was at least as high as the
power that is generated during interaction with the
tested models in the Pure test.

Within each block, there were eight trials in which the
combined human-noise handshake with *G _{h}* =1 - β and

*G*1= β for each of the eight values of β was compared to a combined human-noise handshake with

_{c}*G*= 0.5 and

_{h}*G*= 0.5. These trials were added to serve as data for a calibration curve for each subject, to make sure that, indeed, the human handshake with the higher noise component is perceived as less human-like. Overall, each experimental block consisted of 32 trials in which each of the eight linear combinations of the noise and the human (Equation 8) were compared with each of the three models and the noise combined with the human model. Each experiment consisted of 10 blocks. Thus, each of the models was presented to the interrogator in 80 handshakes, similar to the protocols in Experiments 1 and 2. One experimental block (32 trials) was added at the beginning of the experiment for general acquaintance with the system and the task. The answers of the subjects in this block were not analyzed. In order to preserve the memory of the feeling of a human handshake in the telerobotic setup, after each experimental block, the subject was presented with six human handshakes. To increase the motivation of the participants, at the end of each block, they received a grade that was calculated based on their answers in the calibration trials.

_{c}After completion of both sessions, we fitted psychometric functions to the answers of the interrogators, extracted the PSE, and calculated the MHLGn of each of the models according to Equation 9.

*Statistical Analysis *

The values of MHLG are bounded in [0,1], regardless of the specific version of the Turing-like test that is used. Therefore, we used the nonparametric Friedman’s test (M. Friedman, 1937) in order to determine whether the difference between the MHLG values of the models is statistically significant. We used the Wilcoxon sign-rank test with Bonferroni correction for multiple comparisons in order to perform the comparisons between the individual models.

*Results *

Examples of psychometric curves that were fitted to the answers of two selected interrogators from the Weighted and Noise tests are depicted in Figure 11.

First, as evident from the successful fitting of psychometric curves, we validated the assumptions that a handshake
with higher weight of human handshake relative to a computer generated model for handshake or noise has
higher probability to be chosen as more human-like.
Moreover, the calibration curves (i.e., the spring model
in panel A and the noise model in panel B) indeed yield
PSE that is not statistically significantly different from 0
and 0.5, respectively.

Both interrogators perceived the linear spring model as the least human-like (solid curves). However, the spring and damper model was identified as most human-like in the Weighted test (Panel A, dashed curve), while the mixture of sinusoids model was perceived as the most human-like in the Noise test (Panel B, dotted curve).

The MHLG of individual subjects for each of the
Turing-like tests are presented in Figure 12. Estimations
of the mean of MHLG of all models from all tests are
presented in Figure 13, together with the 95 percent confidence intervals for these estimations. The Pure test was
not sensitive enough for discriminating between the
MHLG values of the three tested models, as shown in
Figure 12(a), and there was no statistically significant
effect of model (Friedman’s test, *p* = .45). This was due
to the fact that when each of the interrogators was introduced with one of the models and with a human handshake, he or she almost always correctly identified the
human handshake, yielding very small MHLG values.
This suggests that all three models of handshake were far
from being a human-like handshake relative to the level
of confusion of the interrogator, similar to the simulated
results for very low decision variance (σ < 20).

Consistent with the predictions from our simulation
study, the Weighted and Noise tests revealed a significant effect of model on MHLG (Friedman’s test, *p* =
.049 and *p* = .015, respectively), as is clearly evident in
Figure 12(b–c) and Figure 13. This leads to the conclusion that for these models, the more appropriate test is
either the Weighted or Noise Turing-like test. Interestingly, while the linear spring model was least human-like
according to both tests, there was no agreement about
the most human-like model: the mixture of sinusoids
model was the most human-like according to the
Weighted test, and the linear spring and damper model
was the most human-like according to the Noise test.

**Discussion **

In this study, we considered three versions of a Turing-like handshake test: Pure, Weighted, and Noise. In all these tests, a human interrogator interacts with different combinations of pairs of human, computer, or combined handshakes, and is asked which handshake felt more human. We presented a simulation study that quantified the differences between these tests in their ability to assess the human-likeness of computer-generated handshakes. We concluded the paper with an experimental demonstration of testing the human-likeness of three simple models for the human handshake.

The simulation study suggests that the choice of the best test to differentiate the human-likeness of computer-generated handshakes depends on how well the interrogator identifies a human handshake when compared with a model, namely, the decision variance of the interrogator. The Pure test is better for intermediate and large levels of interrogator confusion, and the Weighted and Noise tests are good for low levels of confusion. While it seems that the Noise test outperforms the Weighted test, an important condition must be satisfied before an effective Noise test can be implemented: we have to define the noise function—the least human-like handshake possible. Therefore, the Weighted test is necessary at least for finding a model that is far enough from a human handshake to serve as noise.

In our simulation study, we assumed that the 1D internal-representations of a handshake human-likeness has a Gaussian distribution, and that for all computer, human, and combined handshakes, and for all subjects, this distribution has an identical standard deviation. These assumptions are probably not correct; for example, the assumption of constant variance does not take into account the Weber and Fechner laws (Norwich, 1987). We did not take into account the possibility that the decision process concerning the level of human-likeness of a handshake has a multiplicative rather than additive noise, and a particular structure of observer model (Lu & Dosher, 2008). In order to properly take these properties into account in our assumptions, we would have to choose the observer model (Lu & Dosher), the appropriate power function that relates the actual level of stimulus to the perceived human-likeness, and even decide whether such a function exists (Stevens, 1957). Since there is no established characterization of the perception of handshake psychophysics, we chose to start with the simplest assumptions. With future progress in the psychophysical evaluation of human-likeness of computer-generated handshakes, these assumptions would probably be revised and additional methodological progress would be possible based on more true-to-life formulations.

According to our experimental results, the Pure test
was not successful in discrimination of human-likeness of
the linear spring, linear spring and damper, and mixture
of sinusoids models for human handshake. This implies
that the decision variance of the interrogator is low, and
that the suggested models are far from being human-like. Therefore, when the interrogator is asked to compare a human handshake and a model handshake, he or
she mostly chooses the correct answer. However, consistent with our simulations, both the Weighted and
Noise test successfully discriminated between these simple models. We expect that when we will test models for
handshake that are more human-like, the Pure test will
become more effective for discriminating between them
and for identifying the most human-like handshake
model. This observation suggests an additional methodological recommendation: for each new set of models,
it is useful to perform a pilot study with a small number
of subjects but using all three Turing-like tests. The
results of these tests taken together can be used as an operational estimation of the level of confusion of the
interrogator with respect to the human-likeness of the
tested models. For example, models that yield a consistent estimation of MHLG_{p} = 0, MHLG_{w} = 1, and 0 <
MHLG_{n} < 1 indicate a low level of interrogator confusion, and a high level of human-likeness.

Interestingly, while in our simulation study, the grading of different models was consistent between Weighted and Noise tests, this was not the case in the experimental study. The linear spring model was consistently perceived as the least human-like model, but there was a disagreement about the human-likeness of the linear spring and damper and mixture of sinuses models. One possible explanation for this observation is that the internal representation of human-likeness is multidimensional. Each interrogator might follow a different decision path in the space of human-likeness features when grading the models. An example of such a situation is when the human-likeness is determined according to a weighted average of the different features, as depicted in Figure 14. According to this view, all the interrogators would correctly identify the least and the most human-like possible handshakes, but may have various opinions about the salient feature characterizing human-likeness.

In particular, the linear spring and damper and mixture of sinusoids represent two different classes of models, a passive linear system, and an active stochastic force
generator, respectively. A priori, it is difficult to predict
which class is expected to be more human-like. A passive,
linear system creates forces only in response to the movement of the interrogator, and the frequency content of
the resultant handshake never contains frequencies that
did not exist in this movement. Hence, the resultant
handshake is highly synchronized, but also very predictive. Such a handshake would feel natural to an interrogator who is used to dominating handshakes, and who
always takes the leader role in a handshake. An active stochastic force generator introduces unpredicted frequency
content, and initiates interaction even if the interrogator
does not do so. Thus, such a handshake might feel more
human-like to an interrogator who is used to following
and complying with the other opponent’s movement
during everyday handshakes. However, it might feel out
of sync and unpleasant if the interrogator tries to lead
the handshake. These two features could be examples for
different dimensions of the overall human-likeness representation that were suggested in the previous paragraph,
and the weighting between these features could be
determined by the dominance of the interrogator in
physical interactions (Groten *et al*., 2009). In future
studies, these two features could be combined into one
model of a handshake. In addition, it might be beneficial
to identify the dominance of the interrogator in collaborative tasks in order to adjust the specific weight of active
and passive components in the handshake.

To further improve future models of handshake, it
can be useful to include a few theories about the nature
of the control of human movements. For example, the
linear spring and damper system can be replaced with a
Hill-type mechanical model (Karniel & Inbar, 1997) or
one-fifth power damping (Barto, Fagg, Sitkoff, & Houk,
1999). For the class of active models, it can be interesting to consider using trajectories that are the result of
optimization with respect to some cost function, for
example, minimum jerk (Flash & Hogan, 1985), or minimum acceleration with constraints (Ben-Itzhak &
Karniel, 2007), instead of the simple sinusoids that we
used in the current study. In addition, it is possible to incorporate forward models for controlling the grip
force (Flanagan & Wing, 1997; Kawato, 1999) into the
construction of control law, or to adopt optimal feedback control strategies (Todorov & Jordan, 2002) and
adaptive control. In Avraham *et a**l*. (in press), we
utilized the Noise Turing-like handshake test, and compared the human-likeness of three models that were
based on different aspects of the human control of
motion.

The difference in grading might be the result of the
subjective and declarative nature of the test. In order to
overcome the differences between the cognitive processes across subjects, it can be useful to look at objective,
physiologically related, measures, such as skin conductance response (Laine, Spitler, Mosher, & Gothard,
2009), heart rate (Anttonen & Surakka, 2005), postural
responses (Freeman, Avons, Meddis, Pearson, & IJsselsteijn, 2000), or task performance (Feth *et al*., 2011;
Schloerb, 1995). This is of special importance, as declarative perception is not always consistent with motor
responses (Aglioti, DeSouza, & Goodale, 1995; Ganel &
Goodale, 2003; Goodale & Milner, 1992; Nisky,
Pressman, Pugh, Mussa-Ivaldi, & Karniel, 2011; Pressman, Nisky, Karniel, & Mussa-Ivaldi, 2008). In particular, a declarative, subjective, evaluation of presence in virtual and remote environments was shown to be
unreliable, and behavioral, objective, presence measures,
such as postural responses, were suggested (Freeman *et al*.). In the context of human–robot interaction, Reed
and Peshkin (2008) showed that while participants who
interacted with a robotic partner reported that they
interacted with a human in the verbal Turing test, they
did not reach the same level of performance as in the
human–human dyad.

The use of virtual reality, telepresence, and teleoperation systems for the study of perception has been growing over the last few years (Jin, 2011; Zaal & Bootsma,
2011). In a recent work (Feth *et al.*, 2011), human–robot interaction in a virtual environment was studied,
and the human-likeness of virtual partners with a predetermined or adaptive collaborative behavior was evaluated. They developed two psychophysical tests using a
predefined scale or a pair-wise comparison, to assess the
similarity of the virtual partner to a human partner in terms of haptic perception. Our Pure test resembles their
pair-wise comparison approach, but in our test, we compare each handshake model only to a human handshake,
while they applied Thurstone’s law of comparative judgment, Case V, and performed all possible paired comparisons between the different virtual opponents, as well as
random and human opponent. Both of these approaches
are based on Thurstonian scaling and SDT (MacMillan,
2002), but differ in the overall number of comparisons.
While our method is more economical in terms of experimental burden, as it uses a minimal number of comparisons, the method of Feth *et al.* provides a more direct
assessment of the relative human-likeness of each pair of
models, and, hence, provides a more accurate estimation.
In a future study, it will be interesting to compare these
two approaches in a single experiment with an identical
number of overall comparisons and assess the statistical
power of each of the methods in discrimination of
human-likeness.

There are two fundamentally different approaches to
the challenge of quantifying the perceived human-likeness of a particular model for the handshake. One is to
present the participants with various handshakes, and
ask for a quantitative grade on some predefined scale
(Feth* et al.*, 2011; Ikeura *et al.*, 1999). The other is to
use a 2AFC method: present the participant with two
handshakes and ask which one is more human-like (Feth*
et al*.; Karniel, Avraham, *et al.,* 2010). The main advantage of the latter approach is that it allows us to treat the
problem within the well-studied signal detection theory
(Abdi, 2007b; Gescheider, 1985; Lu & Dosher, 2008;
MacMillan, 2002), and use the methodological tools
that were developed within this framework, for example,
fitting psychometric curves to the answers of participants, and assessing perception thresholds and discrimination sensitivity. The 2AFC method followed by fitting
of psychometric curves is used extensively in haptic exploration: the combination is used to assess perception
of height (Ernst & Banks, 2002), shape (Helbig &
Ernst, 2007), stiffness (Nisky, Baraduc, & Karniel,
2010; Nisky, Mussa-Ivaldi, & Karniel, 2008; Pressman*
et al.*, 2008; Pressman, Welty, Karniel, & Mussa-Ivaldi,
2007), and more. The combination is also a very
prominent technique for exploring perception in other modalities such as auditory (Warren, Uppenkamp, Patterson, & Griffiths, 2003), visual (Hoffman, Girshick,
Akeley, & Banks, 2008), and smell (Uchida & Mainen,
2003). Importantly, the method is used not only for
pure sensory modalities discrimination assessment, but
also for quantifying cognitive representation, such as in
the case of perception of numerical information in monkeys (Nieder & Miller, 2004), the effect of linguistic
perception of motion verbs on perception of motion
(Meteyard, Bahrami, & Vigliocco, 2007), or recognition of emotions (Pollak, Messner, Kistler, & Cohn,
2009).

In the current study, we present three versions of the Turing-like test for handshake. These tests complement each other in their ability to discriminate between the human-likeness of different computer models for different levels of confusion of the human interrogator. In our experimental study, we focused on a reduced version of a handshake: a 1D point interaction through a robotic handle. This approach allows for an exploration of the simple characteristics of human motion that contribute to the perception of human-likeness. In the next step, additional aspects of human-likeness should be explored, both within and outside of the haptic modality, such as grip force, texture, temperature, and moisture, as well as vision and sound.

We believe that the current study provides an important step in the process of building human-like humanoid robots, and will help to facilitate development of natural human–robot interactions, with its promising applications for teleoperation and telepresence.

**Acknowledgments**

This work was supported by the Israel Science Foundation Grant number 1018/08. Ilana Nisky was supported by the Kreitman and Clore foundations.

*Lead photo by VCG/VCG via Getty Images*

*This article was originally published in the spring 2012 issue of the journal *Presence.

**References**

Abdi, H. (2007a). The Bonferroni and Sidak corrections for
multiple comparisons. In N. Salkind (Ed.), *Encyclopedia of
measurement and statistics. *Thousand Oaks, CA: Sage.

Abdi, H. (2007b). Signal detection theory. In N. Salkind
(Ed.), *Encyclopedia of measurement and statistics*. Thousand
Oaks, CA: Sage.

Aglioti, S., DeSouza, J. F. X., & Goodale, M. A. (1995). Size-contrast illusions deceive the eye but not the hand. *Current
Biology,* 5(6), 679–685.

Anttonen, J., & Surakka, V. (2005). Emotions and heart rate while sitting on a chair. Paper presented at the SIGCHI Conference on Human Factors in Computing Systems, Portland, Oregon.

Avraham, G., Levy-Tzedek, S., & Karniel, A. (2009). Exploring the rhythmic nature of handshake movement and a Turing-like test. Paper presented at the Fifth Computational Motor Control Workshop, Beer-Sheva, Israel.

Avraham, G., Levy-Tzedek, S., Peles, B.-C., Bar-Haim, S., & Karniel, A. (2010). Reduced frequency variability in hand- shake movements of individuals with cerebral palsy. Paper pre- sented at the Sixth Computational Motor Control Work- shop, Beer-Sheva, Israel.

Avraham, G., Nisky, I., Fernandes, H., Acuna, D., Kording,
K., Loeb, G., & Karniel, A. (submitted). Towards perceiving
robots as humans—Three handshake models face the
Turing-like handshake test. *IEEE Transactions on Haptics.*
doi:10.1109/TOH.2012.16.

Avraham, G., Nisky, I., & Karniel, A. (2011). When robots become humans: A Turing-like handshake test. Paper presented at the CMCW7, Seventh Annual Computational Motor Control Workshop at Ben-Gurion University of the Negev, Beer-Sheva, Israel.

Bailenson, J. N., & Yee, N. (2008). Virtual interpersonal
touch: Haptic interaction and copresence in collaborative
virtual environments. *Multimedia Tools and Applications,*
37(1), 5–14. doi:10.1007/s11042-007-0171-2

Bailenson, J. N., & Yee, N. (2007). Virtual interpersonal touch
and digital chameleons. *Journal of Nonverbal Behavior*,
31(4), 225–242.

Bailenson, J. N., Yee, N., Brave, S., Merget, D., & Koslow, D.
(2007). Virtual interpersonal touch: Expressing and recog-
nizing emotions through haptic devices. *Human–Computer
Interaction*, 22(3), 325–353.

Barto, A. G., Fagg, A. H., Sitkoff, N., & Houk, J. C. (1999).
A cerebellar model of timing and prediction in the control of
reaching. *Neural Computation, *11(3), 565–594.
doi:10.1162/089976699300016575

Ben-Itzhak, S., & Karniel, A. (2007). Minimum acceleration
criterion with constraints implies bang-bang control as an
underlying principle for optimal trajectories of arm reaching movements. *Neural Computation*, 20(3), 779–812.
doi:10.1162/neco.2007.12-05-077

Biggs, J., & Srinivasan, M. A. (2002). Haptic interfaces. In
K. Stanney (Ed.), *Handbook of virtual environments
*(pp. 93–115). Mahwah, NJ: Lawrence Erlbaum.

Chaplin, W. F., Phillips, J. B., Brown, J. D., Clanton, N. R., &
Stein, J. L. (2000). Handshaking, gender, personality, and
first impressions. *Journal of Personality and Social Psychology*,
79(1), 110–117.

Cohen, J. (1988). *Statistical power analysis for behavioral sciences* (2nd ed.). Mahwah, NJ: Lawrence Erlbaum.

Cohen, J. (1992). A power primer. *Psychological Bulletin*,
112(1), 155–159.

Durlach, N., & Slater, M. (2000). Presence in shared virtual
environments and virtual togetherness. *Presence: Teleoperators and Virtual Environments*, 9(2), 214–217.
doi:10.1162/105474600566736

El Saddik, A. (2007). The potential of haptics technologies. *IEEE
Instrumentation & Measurement Magazine*, 10(1), 10–17.

Ernst, M. O., & Banks, M. S. (2002). Humans integrate visual
and haptic information in a statistically optimal fashion.
*Nature*, 415(6870), 429–433.

Feth, D., Groten, R., Peer, A., & Buss, M. (2011). Haptic
human–robot collaboration: Comparison of robot partner
implementations in terms of human-likeness and task performance. *Presence: Teleoperators and Virtual Environments*,
20(2), 173–189. doi:10.1162/pres_a_00042

Flanagan, J. R., & Wing, A. M. (1997). The role of internal
models in motion planning and control: Evidence from grip
force adjustments during movements of hand-held loads.
*The Journal of Neuroscience*, 17(4), 1519–1528.

Flash, T., & Hogan, N. (1985). The coordination of arm
movements: An experimentally confirmed mathematical
model. *Journal of Neuroscience*, 5(7), 1688–1703.

Freeman, J., Avons, S. E., Meddis, R., Pearson, D. E., & IJsselsteijn, W. (2000). Using behavioral realism to estimate presence: A study of the utility of postural responses to motion
stimuli. *Presence: Teleoperators and Virtual Environments*,
9(2), 149–164. doi:10.1162/105474600566691

Friedman, M. (1937). The use of ranks to avoid the assump-
tion of normality implicit in the analysis of variance. *Journal
of the American Statistical Association*, 32(200), 675–701.

Friedman, R., Hester, K., Green, B., & LaMotte, R. (2008).
Magnitude estimation of softness.* Experimental Brain
Research*, 191(2), 133–142.

Ganel, T., & Goodale, M. A. (2003). Visual control of action
but not perception requires analytical processing of object
shape. *Nature,* 426(6967), 664–667.

Garcia-Perez, M. A., & Alcala-Quintana, R. (2011). Improving
the estimation of psychometric functions in 2AFC discrimi-
nation tasks. *Frontiers in Psychology*, 2(96), 1–9.
doi:10.3389/fpsyg.2011.00096

Gentry, S., Feron, E., & Murray-Smith, R. (2005). Human–human haptic collaboration in cyclical Fitts’ tasks. Paper presented at the IROS 2005, IEEE/RSJ International Conference on Intelligent Robots and Systems.

Gescheider, G. A. (1985). *Method, theory, and application.*
Mahwah, NJ: Lawrence Erlbaum.

Goodale, M. A., & Milner, A. D. (1992). Separate visual pathways for perception and action. *Trends in Neurosciences*,
15(1), 20–25.

Groten, R., Feth, D., Goshy, H., Peer, A., Kenny, D. A., & Buss, M. (2009). Experimental analysis of dominance in hap- tic collaboration. Paper presented at the RO-MAN 2009, the 18th IEEE International Symposium on Robot and Human Interactive Communication.

Hannaford, B. (1989). Stability and performance tradeoffs in bi-lateral telemanipulation. Paper presented at the ICRA ’89, IEEE International Conference on Robotics and Automation, Scottsdale, AZ.

Helbig, H., & Ernst, M. (2007). Optimal integration of shape
information from vision and touch. *Experimental Brain
Research*, 179(4), 595–606. doi:10.1007/s00221-006-
0814-y

Hespanha, J. P., McLaughlin, M., Sukhatme, G. S., Akbarian, M., Garg, R., & Zhu, W. (2000). Haptic collaboration over the internet. Paper presented at the Fifth PHANToM Users Group Workshop.

Hinds, P. J., Roberts, T. L., & Jones, H. (2004). Whose job is
it anyway? A study of human–robot interaction in a collaborative task. *Human–Computer Interaction*, 19(1), 151–181.
doi:10.1207/s15327051hci1901\&2_7

Hoffman, D. M., Girshick, A. R., Akeley, K., & Banks, M. S.
(2008). Vergence-accommodation conflicts hinder visual
performance and causes visual fatigue. *Journal of Vision*,
8(3). doi:10.1167/8.3.33

Ikeura, R., Inooka, H., & Mizutani, K. (1999). Subjective eval- uation for maneuverability of a robot cooperating with human. Paper presented at the RO-MAN 1999, 8th IEEE International Workshop on Robot and Human Interaction.

Jin, S.-A. A. (2011). ‘‘It feels right. Therefore, I feel present
and enjoy’’: The effects of regulatory fit and the mediating roles of social presence and self-presence in avatar-based 3D
virtual environments. *Presence: Teleoperators and Virtual
Environments*, 20(2), 105–116.

Jindai, M., Watanabe, T., Shibata, S., & Yamamoto, T. (2006). Development of handshake robot system for embodied interaction with humans. Paper presented at the 15th IEEE International Symposium on Robot and Human Interactive Communication, Hatfield, UK.

Jones, L. A., & Hunter, I. W. (1990). A perceptual analysis of
stiffness. *Experimental Brain Research*, 79(1), 150–156.

Jones, L. A., & Hunter, I. W. (1993). A perceptual analysis of
viscosity. *Experimental Brain Research*, 94(2), 343–351.

Karniel, A. (2010). A Turing-like handshake test for motor intelligence. Paper presented at the 20th Annual Meeting of the Society for Neural Control of Movement, Naples, Florida.

Karniel, A., Avraham, G., Peles, B.-C., Levy-Tzedek, S., & Nisky,
I. (2010). One dimensional Turing-like handshake test for
motor intelligence. *Journal of Vision Exploration*, 46, e2492.

Karniel, A., & Inbar, G. F. (1997). A model for learning
human reaching movements. *Biological Cybernetics*, 77(3),
173–183. doi:10.1007/s004220050378

Karniel, A., & Mussa-Ivaldi, F. A. (2003). Sequence, time, or
state representation: How does the motor control system
adapt to variable environments? *Biological Cybernetics*, 89(1),
10–21.

Karniel, A., Nisky, I., Avraham, G., Peles, B.-C., & Levy-
Tzedek, S. (2010). A Turing-like handshake test for motor
intelligence. In A. Kappers, J. van Erp, W. Bergmann Tiest, &
F. van der Helm (Eds.), *Haptics: Generating and perceiving
tangible sensations* (Vol. 6191, pp. 197–204). Berlin:
Springer.

Kasuga, T., & Hashimoto, M. (2005). Human-robot handshak- ing using neural oscillators. Paper presented at the International Conference on Robotics and Automation, Barcelona, Spain.

Kawato, M. (1999). Internal models for motor control and trajectory planning. *Current Opinion in Neurobiology*, 9(6),
718–727. doi:10.1016/s0959-4388(99)00028-8

Kim, J., Kim, H., Tay, B. K., Muniyandi, M., Srinivasan, M. A.,
Jordan, J., . . . Slater, M. (2004). Transatlantic touch: A
study of haptic collaboration over long distance. *Presence:
Teleoperators and Virtual Environments*, 13(3), 328–337.
doi:10.1162/1054746041422370

Kunii, Y., & Hashimoto, H. (1995). Tele-handshake using HandShake device. Paper presented at the 1995 IEEE IECON, 21st International Conference on Industrial Electronics, Control, and Instrumentation.

Laine, C. M., Spitler, K. M., Mosher, C. P., & Gothard, K. M.
(2009). Behavioral triggers of skin conductance responses
and their neural correlates in the primate amygdala.* Journal
of Neurophysiology*, 101(4), 1749–1754. doi:10.1152/
jn.91110.2008

Lu, Z.-L., & Dosher, B. A. (2008). Characterizing observers
using external noise and observer models: Assessing internal
representations with external noise. *Psychological Review,*
115(1), 44–82. doi:10.1037/0033-295x.115.1.44

MacMillan, N. A. (2002). Signal detection theory. *Stevens’
handbook of experimental psychology.* New York: John Wiley.

McLaughlin, M., Sukhatme, G., Wei, P., Weirong, Z., & Parks, J. (2003). Performance and co-presence in heterogeneous haptic collaboration. Paper presented at the HAPTICS 2003, 11th Symposium on Haptic Interfaces for Virtual Environments and Teleoperator Systems.

Meteyard, L., Bahrami, B., & Vigliocco, G. (2007). Motion
detection and motion verbs. *Psychological Science*, 18(11),
1007–1013. doi:10.1111/j.1467-9280.2007.02016.x

Miller, J., & Ulrich, R. (2001). On the analysis of psychometric
functions: The Spearman-Ka ̈rber method. *Attention, Perception, & Psychophysics*, 63(8), 1399–1420. doi:10.3758/
bf03194551

Miyashita, T., & Ishiguro, H. (2004). Human-like natural
behavior generation based on involuntary motions for
humanoid robots. *Robotics and Autonomous Systems*, 48(4),
203–212.

Morasso, P. (1981). Spatial control of arm movements. *Experimental Brain Research*, 42(2), 223–227.

Nieder, A., & Miller, E. K. (2004). Analog numerical representations in Rhesus monkeys: Evidence for parallel processing.
*Journal of Cognitive Neuroscience*, 16(5), 889–901.
doi:10.1162/089892904970807

Niemeyer, G., & Slotine, J.-J. E. (2004). Telemanipulation
with time delays. *The International Journal of Robotics
Research*, 23(9), 873–890. doi:10.1177/
0278364904045563

Nisky, I., Baraduc, P., & Karniel, A. (2010). Proximodistal gradient in the perception of delayed stiffness. *Journal of Neurophysiology*, 103(6), 3017–3026.

Nisky, I., Mussa-Ivaldi, F. A., & Karniel, A. (2008). A regression
and boundary-crossing-based model for the perception of
delayed stiffness. *IEEE Transactions on Haptics*, 1(2), 73–82.

Nisky, I., Pressman, A., Pugh, C. M., Mussa-Ivaldi, F. A., &
Karniel, A. (2011). Perception and action in teleoperated
needle insertion. *IEEE Transactions on Haptics*, 4(3),
155–166.

Norwich, K. (1987). On the theory of Weber fractions. *Attention, Perception, & Psychophysics*, 42(3), 286–298.
doi:10.3758/bf03203081

Okamura, A., Verner, L., Reiley, C., & Mahvash, M. (2011).
Haptics for robot-assisted minimally invasive surgery. In M.
Kaneko & Y. Nakamura (Eds.), *Robotics research *(Vol. 66,
pp. 361–372). Berlin: Springer.

Ouchi, K., & Hashimoto, S. (1997). Handshake telephone system to communicate with voice and force. Paper presented at the IEEE International Workshop on Robot and Human Communication.

Pollak, S. D., Messner, M., Kistler, D. J., & Cohn, J. F.
(2009). Development of perceptual expertise in emotion recognition. *Cognition*, 110(2), 242–247. doi:10.1016/
j.cognition.2008.10.010

Pressman, A., Nisky, I., Karniel, A., & Mussa-Ivaldi, F. A.
(2008). Probing virtual boundaries and the perception of
delayed stiffness. *Advanced Robotics*, 22, 119–140.

Pressman, A., Welty, L. J., Karniel, A., & Mussa-Ivaldi,
F. A. (2007). Perception of delayed stiffness. *The
International Journal of Robotics Research*, 26(11–12),
1191–1203.

Rahman, M. M., Ikeura, R., & Mizutani, K. (2002). Investiga-
tion of the impedance characteristic of human arm for devel-
opment of robots to cooperate with humans. *International
Journal Series C Mechanical Systems, Machine Elements and
Manufacturing*, 45(2), 510–518.

Reed, K. B., & Peshkin, M. A. (2008). Physical collaboration
of human–human and human–robot teams. *IEEE Transactions on Haptics*, 1(2), 108–120.

Roennqvist, L., & Roesblad, B. (2007). Kinematic analysis of
unimanual reaching and grasping movements in children
with hemiplegic cerebral palsy. *Clinical Biomechanics*, 22(2),
165–175.

Sato, T., Hashimoto, M., & Tsukahara, M. (2007). Synchronization based control using online design of dynamics and its application to human–robot interaction. Paper presented at the 2007 IEEE International Conference on Robotics and Biomimetrics.

Schloerb, D. W. (1995). A quantitative measure of tele-
presence. *Presence: Teleoperators and Virtual Environments*,
4(1), 64–80.

Shadmehr, R., & Mussa-Ivaldi, F. A. (1994). Adaptive repre-
sentation of dynamics during learning of a motor task. *Journal of Neuroscience*, 14(5), 3208–3224.

Shadmehr, R., & Wise, S. P. (2005). *The computational neurobiology of reaching and pointing: A foundation for motor
learning. *Cambridge, MA: MIT Press.

Sheridan, T. B. (1994). Further musings on the psychophysics of presence. Paper presented at the Humans, Information and Technology, 1994 IEEE International Conference on Systems, Man, and Cybernetics.

Sheridan, T. B. (1996). Further musings on the psychophysics
of presence. *Presence: Teleoperators and Virtual Environments*, 5(2), 241–246.

Srinivasan, M. A., & LaMotte, R. H. (1995). Tactual discrimi-
nation of softness. *Journal of Neurophysiology*, 73(1),
88–101.

Stevens, S. S. (1957). On the psychophysical law. *Psychological
Review*, 64(3), 153–181. doi:10.1037/h0046162

Stewart, G. L., Dustin, S. L., Barrick, M. R., & Darnold, T. C.
(2008). Exploring the handshake in employment interviews.
*Journal of Applied Psychology*, 93(5), 1139–1146.
doi:10.1037/0021-9010.93.5.1139

Tan, H. Z., Durlach, N. I., Beauregard, G., & Srinivasan, M.
A. (1995). Manual discrimination of compliance using active
pinch grasp: The roles of force and work cues. *Perception &
Psychophysics*, 57(4), 495–510.

Todorov, E., & Jordan, M. I. (2002). Optimal feedback con-
trol as a theory of motor coordination. *Nature Neuroscience*,
5(11), 1226–1235. doi:10.1038/nn963

Turing, A. M. (1950). Computing machinery and intelligence. *Mind, A Quarterly Review of Psychology and Philosophy*, LIX(236).

Uchida, N., & Mainen, Z. F. (2003). Speed and accuracy of olfactory discrimination in the rat. *Nature Neuroscience*, 6(11), 1224–1229. doi:10.1038/nn1142

van Den, B. (2000). Coordination disorders in patients with Parkinson’s disease: A study of paced rhythmic forearm movements. *Experimental Brain Research*, 134(2), 174–186.

van der Heide, J. C., Fock, J. M., Otten, B., Stremmelaar, E., & Hadders-Algra, M. (2005). Kinematic characteristics of
reaching movements in preterm children with cerebral palsy.
*Pediatric Research*, 57(6), 883.

Wang, Z., Lu, J., Peer, A., & Buss, M. (2010). Influence of
vision and haptics on plausibility of social interaction in vir-
tual reality scenarios. In A. Kappers, J. van Erp, W. Berg-
mann Tiest, & F. van der Helm (Eds.), *Haptics: Generating
and perceiving tangible sensations *(Vol. 6192, pp. 172–177).
Berlin: Springer.

Wang, Z., Peer, A., & Buss, M. (2009). An HMM approach to realistic haptic human–robot interaction. Paper presented at the World Haptics 2009, Third Joint EuroHaptics Conference, and Symposium on Haptic Interfaces for Virtual Environment and Teleoperator Systems.

Warren, J. D., Uppenkamp, S., Patterson, R. D., & Griffiths,
T. D. (2003). Separating pitch chroma and pitch height in
the human brain. *Proceedings of the National Academy of Sciences of the United States of America*, 100(17), 10038–
10042. doi:10.1073/pnas.1730682100

Wichmann, F., & Hill, N. (2001). The psychometric function:
I. Fitting, sampling, and goodness of fit. *Perception & Psychophysics*, 63(8), 1293–1313.

Wolpert, D. M., & Ghahramani, Z. (2000). Computational
principles of movement neuroscience. *Nature Neuroscience*,
3, 1212–1217.

Yokokohji, Y., & Yoshikawa, T. (1994). Bilateral control of
master–slave manipulators for ideal kinesthetic coupling—
Formulation and experiment. *IEEE Transactions on Robotics
and Automation*, 10(5), 605–620.

Zaal, F. T. J. M., & Bootsma, R. J. (2011). Virtual reality as a
tool for the study of perception-action: The case of running
to catch fly balls. *Presence: Teleoperators and Virtual Environments*, 20(1), 93–103.