Background to the Development of Behavioural Questionnaires

Where did we begin?
From inception in 1972 we used a number of proprietary psychometric questionnaires including The 16 PF, Myers-Briggs Type Indicator, and OPQ in our workshops and for personal development activities. During this time participants on workshops and their senior managers talked increasingly about getting questionnaires that were more focused on behaviour. Most of them understood personality theory and that through a range of social attributes the actual behaviour a person used emerged. Therefore, it was recognised and accepted that how individuals saw themselves in terms of their personality and associated behaviour could in fact be quite different to the behaviour they employed in some or different locations depending on the factors present in the situation. We were therefore asked to develop questionnaires that were behaviourally focused and that the outputs did not need to be interpreted by a specialist.

In 1982 Tom Jaap developed the Image<>Impact behavioural questionnaire based on the result of working with over 4000 managers, team leaders and team members from a number of different industries and countries.

Creating the Image<>Impact Questionnaire
The Image<>Impact questionnaire was constructed on a four style model which had significant acceptance in the industries in which we were involved. The questions selected were the outcome of several in-depth studies involving the observation of on-the-job behaviour of a range of leaders from lower to the most senior managers. The studies included shadowing managers for several days to observe the behaviour they used in a range of 'normal' situations from regular meetings to one-on-one interactions with direct reports, peers and more senior managers. A database of over 300 behavioural statements was generated with approximately equal numbers in each of the quadrants. A pilot questionnaire containing 96 pair of behavioural statements was developed using a triangulation process that compared the characteristics of one quadrant against the characteristics of the other three.

The questionnaire was piloted in 12 workshops with a total of 238 participants over a period of 8 months. The feedback from the questionnaires was manually produced and given to the participants during each workshop. Following the workshops the participants were asked to discuss the feedback with their team members and manager to test how well it fitted with their actual behaviour. We also undertook several reality testing processes during this time and together with the feedback for over 60% of the participants we modified the questionnaire to include 48 statements that were proved to be significantly differentiated to make the outputs statistically robust. The resulting Image<>Impact questionnaire has been used by over 20 thousand participants since its modified format emerged in 1983.

Over the years since then it has been modified and updated to take account of changing habits and behaviours. From the mid 1980's we started to develop other behavioural questionnaires in response to requests from clients and now have a suite of over 20 behavioural questionnaires of which 3 are included in our Betterlifetoolkit.com site. In 1990 we started using the Image<>Impact questionnaire as a 360 degree feedback instrument that provided feedback to an individual on how others perceived them and this provided an excellent tool for testing the accuracy of their self-perception.

Providing useful feedback
Historically, the primary use of our behavioural questionnaires has been to provide feedback for participants attending in-house facilitated workshops. However, an increasing number of people wanted access to the questionnaires for their own use. They also wanted the feedback to be quick and in a form that was easily understandable without the need of a facilitator to explain the output.

This was the stimulus to develop the Betterlifetoolkit site with its current set of 3 behavioural questionnaires focused around our Colours concept. Using colour appears to be a much easier metaphor for most people to relate to and goes a long way toward stimulating their desire to explore how they can improve self-awareness and self-management skills.

This has encouraged us to formalise our Colours metaphor as the starting point in helping people gain insights into their preferred behaviour and an understanding of the effect that it has on those with whom they interact.

Our aim is to assist people to make the journey of moving through different levels of inter/intra personal skill building in a way that enables them to achieve their dreams and improve their lives by obtaining the most effective results from every interaction and relationship they have.

Questionnaire Testing Processes Employed

1. Reliability
1.1 Test-Retest Reliability
During the stages of developing and refining each questionnaire we focused on measuring the reliability of the results. The method we used to determine reliability was to measure the same questionnaire output twice to obtain a correlation between the two results. Because we have long term relationships with clients we are afforded the opportunity to conduct before and after tests on participants involved in change strategies. All the managers tested were involved in workshops designed to introduce changes in culture through to new processes and practices. We tested the participants at the start of the first workshop and then retested them at another workshop at least 9 months later. This was to enable the participants to be consistent and therefore it was important to have a reasonable time between the two measurements. That provides a sufficient gap between the two tests to reduce the possibility that participants don't reproduce their earlier responses from memory. During a period of four years we employed this validation test with over 1800 participants.

We ensured that the same questionnaire was used each time to avoid the correlation being negative although the possibility was still within the range of 1.00 to +1.00. Should a negative correlation be obtained it meant that one measure increases, whilst the other measure reduces. On occasions the correlation could, however, be close to zero. In our testing process we assessed that when the correlation approached 1.00, the measure could be said to be more reliable.

This "test retest reliability" procedure is a robust proof of reliability and was carried out using a number of relatively large participant groups. Therefore the process enabled us to differentiate the questions that produced a weaker correlation of less than .4 from those giving a higher score. This enabled us to produce a set of questions based on sound behavioural examples in which all the correlations were statistically significant. However, although some were clearly stronger than others because they had a correlation of between .6 and .75 the full questionnaire emerged with a high level of output validity.

1.2 Internal Scale Reliability
To provide an objective measure of our questionnaire reliability we contracted outside agencies involved in testing analysis to carry out reliability testing for us. They used a number of processes to assess the degree to which the questions are interconnected. The aim was to differentiate the questions in such a way that we eliminated questions that asked the same thing and were therefore just duplicating rather than differentiating a scale. The results were based on the "Cronbach's alpha" statistic which assesses the degree to which items are inter-correlated or not. Although the statistic produces a result that is interpreted much like a correlation it does provide an objective measure of question validity. We accepted questions that produced a correlation of 0.55 and above as these demonstrated a sufficiently acceptable level of internal reliability.

2. Criterion Validity
We were also interested to see how well the questionnaires feedback output related to performance outcomes of those who completed them. This sort of validity, called "criterion validity," is considered the strongest possible support for any theory. This is when we used the services of several psychometricans as the processes employed are very complex and require the knowledge of experts to undertake them. We used the following processes on different occasions to provide us with feedback on the validity of the questionnaires:

2.1 Content Validity
The process involved examining all the questions that were attributed to each of the vectors associated with the questionnaire model. By thoroughly examining their content, the aim was to see if the questions that represented a vector actually described what that vector purported to measure. Any item that did not have sufficient support was eliminated and this enabled us to be confident that the behavioural statements included in the assessment of our questionnaires did in fact provide clear evidence of content validity.

2.2 Construct Validity
All our questionnaires are based on well researched behavioural 'models' and to provide additional validity for them we invited our experts to conduct a statistical factor analysis on our questionnaires. Our goal was to show that the constructs used can be identified in the data collected from each of the questionnaires tested. In our view the outputs of this assessment could add to the evidence supporting the validity of the measure and the approaches we were using.

We recognised that factor analysis requires a large amount of data and we were able to provide this due to having an appropriate number of statements in our questionnaires as well as a significantly large database of results. The factor analysis process requires the data to be mathematically condensed into a set of separate dimensions. As mentioned earlier the technical procedures are complex and tend to involve similar sorts of correlations used to assess reliability. From the statistical output we were able to modify those questions that were not robust enough in supporting the construct and through this process emerged with each questionnaire being adjusted until they were statistically sound.