Skip Navigation

Frequently Asked Questions

The Bayley Scales

What is the Bayley Short Form—Research Edition (BSF-R)? How does the BSF-R compare to the BSID-II? How does it compare to the BSID III?

The Bayley Scales of Infant Development-Second Edition (BSID-II), which assesses young children’s cognitive and motor development, was too long and complex for administration by non-clinicians during the ECLS-B home visit. Consequently, with publisher’s permission, the Bayley Short Form—Research Edition (BSF-R) was developed. The BSF R comprises a subset of items from the BSID-II, which can be used to estimate performance on the full BSID-II and yet was feasible to administer in the home by non-clinicians. The subset of items selected to approximate children's performance on the full BSID-II was chosen using Item Response Theory (IRT) modeling. Children’s estimated BSID-II scores, derived from their performance on the BSF-R, are on the ECLS-B data file. (The item-level BSF-R data used to estimate the BSID-II are not on the file.)
Creating the BSF-R.

The items administered in the BSF-R were selected based on ease of administration and analysis of the item properties using IRT modeling. A two-parameter IRT model was used (discrimination power & item difficulty level). BSID-II publisher data for the administration of BSID-II items in a standardization sample were obtained and all the items were scaled on one metric. Then items were chosen based on difficulty level and discrimination power. First, the item pool was reduced to those items representing the constructs appropriate for the targeted age range at assessment, were at even intervals for difficulty level, and had a discrimination power of approximately 1. Then, within the reduced pool, items that were simple to administer and straightforward in scoring were chosen. Whenever possible, "twofers" were chosen: these are sets of items that can be scored from one administration (e.g., a child is given a cup and 5 blocks; items "puts one block in cup," "puts 3 blocks in cup," and "puts 5 blocks in cup" can all be scored). Also, items requiring the least amount of materials for administration were preferred.

Once the final set of items was determined, they were organized to approximate the BSID-II age sets. That is, the BSID-II groups items into age sets such that no one child received all the items. A child would begin with the items in his/her age set (e.g., a 9-month-old would begin with the 9-month age set which has items appropriate for ages 8-11 months). If these items were too difficult, the assessor then administered the age set for younger children (e.g., the 8-month-old age set, or even the 7-month-old age set). Conversely, if the items were too easy, the child would be administered an age set for older children (e.g., the 10-month-old age set, or even the 11 month old age set). To approximate the BSID-II age sets, the items chosen for the BSF-R were organized into a Core set (i.e., administered to all), a Basal set (i.e., administered to those who performed poorly on the Core set), and a Ceiling set (i.e., administered to those who performed perfectly or nearly perfectly on the Core set). In this way, determining when to administer the Basal or Ceiling set and which set to administer was straightforward and all children were appropriately challenged and assessed. The BSF-R diverges from the BSID-II primarily in its use of shortened core, basal, and ceiling item sets.

Lastly, the BSID-II uses a 30-item Behavior Rating Scale (BRS) to help interpret children's performance. Nine items were chosen from the BRS and included in the ECLS-B for this purpose. These items do not, however, approximate the full BRS.


Children's performance on the BSF-R was used to estimate their performance on the BSID-II through the use of IRT modeling. The standard error of these estimates can be found of the data file (e.g., X1MTLSE, X1MTRSE, X2MTLSE, X2MTRSE). Separate scores were produced for the mental and the motor scales. Three types of scores were generated and can be found on the ECLS-B data file. The scale score (i.e., X1RMTLS, X1RMTRS, X2MTLSCL, X2MTRSCL) represents the number of items a child would have gotten correct on the full BSID-II. It is a straight score and does not take into account prematurity.

Also on the file is the child's ranking relative to other children his/her age in the ECLS-B sample, correcting for prematurity. This ranking is similar to the Developmental Index scores on the BSID-II; this is a standardized score that can be used to compare groups of children. T-scores were used to standardize ECLS-B children's scale scores (i.e., X1MTLT, X1RMTR1, X2MTLTSC, X2MTRTSC). The T-scores have a mean of 50 and a standard deviation of 10. As mentioned above, these scores take into account premature birth. To obtain the child's chronological age at the time of the assessment, the child's birth date was subtracted from the date of the assessment to obtain child age at assessment. In the case of children who were born at least 21 days (i.e., 3 weeks) early, the amount of prematurity (e.g., 4 weeks) was then subtracted from the child's age at assessment. In this way, children born premature were ranked relative to other children at the same developmental age (as opposed to chronological age).

Lastly, the ECLS-B data file includes 20 proficiency probability scores: 10 from the Mental Scale and 10 from the Motor Scale. The individual proficiencies are described in the User's Manual. Each proficiency probability was generated from the child's estimated performance on 4 to 6 BSID-II items and represents the probability that the child has mastered the skill represented by that proficiency. Thus, the proficiency probability scores range from 0 - 1. Scores on a particular proficiency can be averaged across children to produce estimates of mastery rates within population subgroups.


The BSF-R is a subset of the BSID-II and can be equated with the BSID II using IRT modeling.

  • Like the BSID-II, the BSF R has a Mental Scale and a Motor Scale. While the BSID-II has 178 Mental items and 111 Motor items, the BSF-R has 29 Mental items at 9 months and 33 Mental items at 2 years, and 35 Motor items at 9 months and 32 Motor items at 2 years.
  • The BSID-II groups items in age sets; the BSF-R has a Core set of items that are administered to all children and the supplementary Basal and Ceiling items sets that are administered if needed.
  • The BSID-II generates a raw score or "true" score that is then converted to a standardized score known as the Mental Developmental Index or the Motor Developmental Index (MDI). IRT modeling was used to estimate the BSID-II raw or true scores from the BSF-R. These estimated scale scores and their associated standard errors are on the ECLS-B data file. Additionally, T-scores are used to standardize the raw scores relative to the ECLS-B sample, taking into account prematurity.
  • The ECLS-B also provides proficiency probabilities based on the BSID-II raw scores.

Only BSID-II scores are on the ECLS-B data file; the item-level BSF-R scores used to estimate these BSID-II scores are not available on the file.


The BSID-III, published in October of 2005, differs from the BSID-II in that it assesses development in more than just the cognitive and motor domains. It also examines development in areas such as language and adaptive behavior and in the socioemotional domain. Additionally, the BSID-III is normed on a more recent population than the BSID-II; the BSID-III uses the 2000 census in stratifying children by age. More information about the BSID-III can be found at