Examining Kindergarten Children’s Testing and Optimising in the Context of a Gear Engineering Task

Introducing kindergarten children to the engineering design process (EDP) is an important objective of early STEM education. Studies indicate that children often miss the crucial steps of testing and optimising during the EDP and do not persist in making solutions better. The present study explores how children’s goal awareness, self-evaluation ability, domain-specific content knowledge, spatial skills and intelligence relate to their persistence, testing and optimising behaviour and to solution quality. In a standardized procedure, 41 children (4 to 7 years) in Germany worked on an engineering task in the domain of gears. The engineering process was videotaped, children’s testing and optimising, solution quality, goal awareness, self-evaluation, and task persistence were rated with a coding scheme and with interviewer questions. Domain-specific content knowledge, mental rotation ability and figural reasoning were measured with standardized tests. Correlational analyses indicated that goal awareness was positively related to solution quality. However, most children required support by the interviewer to retrieve the goal specifications. Moreover, children’s self-evaluation was negatively related to task persistence. Most children were satisfied with their first solution, even when it did not meet the requirements. Our findings emphasize the important role of teachers in helping children to tackle challenges with the EDP.


INTRODUCTION
Scientists and practitioners agree that STEM (science, technology, engineering, and mathematics) education should begin as early as kindergarten (e.g., NRC, 2013;OECD, 2018) as it can provide children with a fundamental knowledge base for learning STEM subjects in later school years (Kaderavek et al., 2020). Early STEM education aims at promoting children's understanding of domain-specific content knowledge, fostering domain-general process skills, and establishing an understanding of epistemic practices (NRC, 2013). In recent years, engineering has been increasingly emphasized as a central component of early STEM education (Bustamante et al., 2018;John et al., 2018). For instance, the Next Generation Science Standards framework in the US integrated engineering into science education by raising engineering design to the same level as scientific inquiry, and thus emphasises how closely the two are linked to each other (NRC, 2013; on the close link see also Schauble et al., 1991). For instance, an analysis and interpretation of data is required in order to (i) understand relations between causes and effects (scientific inquiry) and (ii) to determine if a designed product works as intended to solve a problem (engineering design). Moreover, there is a growing body of research exploring young children's engineering skills in both formal learning settings, such as preschool and elementary school (Bairaktarova et al., 2011;Gold et al., 2020;Kendall,

Empirical Studies Examining Children's Task Persistence, Testing and Optimising
Testing and optimising require the ability to analyse and interpret data (NRC, 2013), which develops between the ages of 3 and 7 (Zimmerman and Klahr, 2018). Children as young as 3 years are capable of spontaneously producing evidence that would enable causal learning (Schulz et al., 2007). However, studies observing children's engineering indicate that pre-schoolers and kindergartners do not sufficiently analyse and interpret data since they tend to neglect testing and optimising (for an overview, see Gold et al., 2020). For instance, Bairaktarova et al. (2011) stress that 3-to 5-year-olds' testing during free play activities with unstructured materials, such as sandboxes or water tables, semi-structured materials, such as paints and paper, and structured play materials, such as snap circuits focused mainly on the completion of a single construction, rather than on evaluating different prototypes to improve the solution quality. Similarly, in a study by Strimel et al. (2018a) 5-to 7-year-old kindergarten children spent only a very small part of their effort on evaluating solutions when solving engineering design tasks. Moreover, most children ended their design session once a single prototype was built (Strimel et al., 2018a). So far, little is known about young children's persistence, and neglection of testing and optimising. However, the few studies indicate that kindergarteners' persistence, testing and optimising might be related to the problem type (well-vs. illdefined), children's goal awareness, ability to realistically self-evaluate solutions, and domain-specific content knowledge (Kendall, 2015;Lottero-Perdue and Tomayko, 2020;Strimel et al., 2018a). In the following, we will briefly address these points.
Children might need to be aware of the goal in order to test and optimise. Strimel et al. (2018a) suggest that the low incidence of persisting, testing and optimising in their study might have been due to the fact that they provided children with ill-defined engineering tasks. In ill-defined engineering tasks, the goal and constraints are not given in advance; thus, the criteria for evaluating a solution must be determined by the problem solver (Dörner and Funke, 2017). As kindergarten children often find it difficult to identify such criteria, well-defined engineering tasks seem more appropriate to investigate kindergarten children's testing and optimising (Haluschak et al., 2018;Strimel et al., 2018a). In well-defined engineering tasks, there is a clearly defined goal that allows for multiple solutions, and the criteria to evaluate a solution are given in advance (Crismond, 2001). In a museum's activity context, Pagano et al. (2019) investigated how the setting of a goal affected family's conversation about engineering practices. Families in a program with a set goal reflected more often on testing, things not working, optimising, and being successful or unsuccessful than families who completed an activity without a set goal.
Nevertheless, well-defined engineering tasks require the problem solver to build and maintain a mental representation of the goal state, which places high demands on the problem solvers' working memory (Kintsch, 2007). As the capacity of working memory in kindergarten children is still developing (Diamond, 2013), children might have problems in building and maintaining adequate mental representations of a goal state. Karmiloff-Smith (1979) demonstrated that 4-to 9-year-olds built accurate initial mental representations of a problem's goal state, but tended to simplify the problem by modifying their mental representation of the goal state during their problemsolving process (Karmiloff-Smith, 1979). In a qualitative study with a well-defined engineering task, Kendall (2015) investigated how six kindergarten children evaluated two design solutions of bridges. When asked to recall the requirements the bridge had to meet (length and sturdiness), most children did recall only one of the two requirements. Consequently, studies on kindergarten children's persisting, testing and optimising should control whether children are aware of a task's goal and the task's requirements.
Learning from failures by continuously testing and optimising prototypes requires the ability to assess one's own performance and solutions realistically. Findings from developmental psychology indicate that self-evaluations of 4-to 7-year-olds tend to be unrealistically positive (Harter and Bukowski, 2015;Marsh et al., 2012;Oppermann et al., 2018). Such a bias in perception might lead children not to optimise, as there is nothing to optimise from their point of view. When the children in the study by Kendall (2015) were asked to test and optimise the engineered bridges, children tended to focus on what worked, but neglected failures and their sources. Contrary, in a study by Lottero-Perdue and Tomayko (2020) 5-to 7-year-old kindergarten children were able to accurately self-evaluate their engineered solution with respect to the requirements it had to meet, and almost all of the children opted to optimise their design when asked by the researchers. As the authors acknowledge, the engineering task was quite straightforward with respect to success or failure: The children were asked to build a fence out of wooden and foam blocks to contain a small, fast-, and randomly-moving HEXBUG nano® robot and give the robot as much room as possible to move. The children could hardly ignore design failures as in that case the robot evidently broke through the fence. Most children were able to detect failures' causes (e.g., the foam blocks were too light), however, an accurate failure analysis did not automatically lead the children to apply this knowledge in a revised design solution (Lottero-Perdue and Tomayko, 2020). Thus, research on kindergarten children's engineering behaviour might investigate how children's self-evaluation of their solutions relate to their persistence, testing and optimising.
Professional engineers apply scientific and mathematical content knowledge to design, test and optimise solutions (Pleasants and Olson, 2019). For young children, however, the relationship between domain-specific content knowledge and testing, optimising and engineering outcome is less straightforward. On the one hand, studies with 3-to 5-year-olds suggest that children are able to carry out meaningful tests and manipulations on a novel toy, use this evidence to infer causal structures, and thus, learn how the toy works without prior domainspecific knowledge (Schulz et al., 2007). Thus, engaging children in engineering practices such as testing and optimising might provide learning opportunities for relevant domain-specific content knowledge (Lin et al., 2020;Penner et al., 1997).
On the other hand, museum studies with 4-to 9-year-old children and their parents in two conditions (brief instruction on relevant domain-specific content knowledge or not) showed that instructed families displayed more engineering-related talk, and their children made use of the domain-specific knowledge more often than in the other condition (Benjamin et al., 2010;Marcus et al., 2021). To the best of our knowledge, engineering studies with young children so far did not measure children's prior-knowledge in the relevant science domain. In the present study, we therefore measure prior-knowledge and examine how it relates to task persistence, testing, optimising and solution quality.
Moreover, spatial ability has been found to be strongly associated with the development of expertise in STEM over the course of education (Wai et al., 2009). With respect to engineering, spatial skills have been found to be closely linked to physical object manipulation (such as testing and making changes to constructions) for 9-to 13year-old children (Ramey and Uttal, 2017). The correlation between spatial skills and engineering is likely to be greater the more the engineering task requires the problem-solver to mentally manipulate information about objects in the environment and the motion of objects (Tõugu et al., 2017). In the present study, the children were given a task in which the rotation of gears had to be considered. Consequently, mental rotation ability could possibly play a role and was therefore assessed.
Furthermore, fluid intelligence is one of the most important prerequisites for learning and closely interrelated with reasoning and problem-solving (Diamond, 2013). To the best of our knowledge, studies on engineering in kindergarten did not control for children's fluid intelligence and how it might relate to persistence, testing, optimising and solution quality.

Present Study
In the present exploratory study, we aim at learning more about how kindergarten children's persistence, testing and optimising in an engineering task is related to children's goal awareness, their self-evaluation of the solution, and their domain-specific content knowledge, spatial skills, and fluid intelligence. As previous studies mostly used more qualitative methods providing fine-granulated insights, we would like to complement these qualitative analyses with a more quantitative approach by measuring and statistically correlating children's persistence, testing and optimising, goal awareness, self-evaluation of the solution, and the objective solution quality to learn more about the interrelations. Additionally, we measure children's content knowledge in the relevant domain, and include mental rotation ability as an indicator of spatial skills as well as figural reasoning as an indicator of fluid intelligence in our analysis. We address the following research questions (RQs): 1. RQ-1: What solution quality do the children achieve? (solution quality) 2. RQ-2: Do the children test and optimise? (testing and optimising) 3. RQ-3: Are the children aware of the goal? (goal awareness) 4. RQ-4: Are the children satisfied with their solution? (self-evaluation) 5. RQ-5: Do the children make full use of the given time for engineering? (time-on-task as an indicator of task persistence) 6. RQ-6: Do the children want to make a change to their solution when explicitly asked? (willingness to make changes as an indicator of task persistence) 7. RQ-7: How are solution quality, testing, optimising, goal awareness, self-evaluation, task persistence, domain-specific content knowledge, spatial skills, and fluid intelligence related? We expect that the better the children's goal awareness is, the more they will test and optimise, which will correspond with a higher time-on-task, and result in better solution quality. Moreover, we assume a negative correlation between children's self-evaluation and their willingness to make changes to their solution.

Participants
Forty-one children (18 girls and 23 boys) participated in the study. We recruited the children by convenience sampling in seven kindergartens in a rural part of Germany. All participants were informed about the goals of the study. The children participated voluntarily and with the written consent of their parents. The convenience sampling yielded the following age distribution: Three children were 4 years old (one child was 51 months, two children were 58 months), 17 children were 5 years old, another 17 children were 6 years old, and two children were 7 years old (82 months and 91 months). The age information is missing for two children. Given the explorative nature of our study, we deliberately decided to allow the wide age range to get as much variance as possible in our data.

Engineering Task
We developed a well-defined engineering design task in the domain of gears. Gears seem to be a suitable domain to study 4-to 7-year-old children's testing and optimising for at least three reasons: (i) Children can easily construct, test and modify gears with age-appropriate (toy) gear-kits. (ii) Gears allow for formulating tasks with clear goals (well-defined problems), e.g., making a target gear spin in a given direction with specified turning direction of a driving gear. Moreover, the children can directly and constantly test the achievement of such a goal by turning the gears and receive immediate feedback by observing the resulting turning direction. (iii) Gears are an elementary component of mechanics and incorporate domain-specific content knowledge. Developmental studies suggest that children differ with respect to their content knowledge about gears' turning direction (Reuter and Leuchter, 2021;Lehrer and Schauble, 1998). Among 4-to 7-year-olds, there are children who know that meshed gears turn in opposite directions, children who have naïve concepts (meshed gears turning in the same direction), and children who have no apparent concepts (Reuter and Leuchter, 2021).
In a one-to-one setting, the children received a plastic peg board (31cm×22cm) with two red plastic gears (approximately 12 cm in diameter with 14 teeth) that were permanently fixed on the board. One gear was on the left-hand side at the front of the peg board and had a crank (driving gear). An arrow pointing to the right was glued on the driving gear. The other gear was on the right-hand side further back on the peg board. This gear also had an arrow on it pointing to the right. Additionally, a sitting stick figure made of paper looking in the direction of the arrow was placed at the gear (see Figure 1).

Figure 1. Peg board driving gear and "carousel" | construction materials
The interviewer told the children that this gear was a 'carousel' with a person on it. Then, the interviewer instructed the children that the 'carousel' should turn forward, i.e., in the direction of the arrow (specification 1) when the driving gear is turned in the direction of the arrow (specification 2). As constructional materials, the children received two red gears that were the same size as the driving gear and carousel (large gear), two mediumsized blue gears (approximately 9cm in diameter with ten teeth), two small purple gears (approximately 6 cm in diameter with six teeth) and six plugs (see Figure 1). The peg board had prefabricated holes that allowed one to easily mount the gears with a plug. The interviewer pointed out to the children that she or he could use all of the materials, but they did not have to do so. It was possible to connect the driving gear with the carousel using two gears (e.g., the two large red ones). The intended difficulty for the children was to realize that this straightforward (and maybe the most obvious) solution would result in the wrong turning direction of the carousel; thus, it would not meet specification 1. The task can be characterized as a well-defined engineering task since it had a clear goal state, i.e., clearly defined specifications (turning direction of the carousel with a given turning direction of the driving gear); it demanded goal-oriented thinking under constraints (the driving gear and carousel were fixed, and the construction materials and space were restricted); and it allowed for more than one possible solution (e.g., connecting driving gear and carousel with three or with five gears, and there were different possible arrangements of the gears).

Test Procedure
The engineering design task was conducted in one-to-one situations in a quiet room in the children's kindergartens. The interviewer and the child were sitting side by side at a table or on the floor. It was carried out using a standardized procedure, consisting of three or four steps: (i) introduction, (ii) engineering trial 1, (iii) assessment of task persistence, goal awareness and probing for children's self-evaluation of their solution and, if applicable (iv) engineering trial 2. Table 1 describes the procedure of the engineering task in detail. The complete procedure was videotaped.

Coding Scheme for the Engineering Design Task
To analyse the children's testing and optimising, we developed a coding scheme in a two-step process. First, all categories of the coding scheme were derived in a top-down approach. We therefore adapted parts of the Metacognitive Skills in Constructional Play Engagement (MetaSCoPE) coding scheme (Spektor-Levy et al., 2017). Second, two raters independently tested the coding scheme with 20 videos. To obtain interrater agreement, we calculated Cohen's kappa (Cohen, 1960). In an iterative process, the coding scheme was revised in a bottom-up procedure, uncertainties were discussed by the raters and decision criteria were clarified. The final coding scheme comprised a code to rate the solution quality, four dichotomous codes indicating children's testing and optimising, and two codes to record the children's goal awareness.

Solution quality
We rated the quality of the prototype the children had built with a score ranging from 0 to 3 points (see Table 2) after trial 1 and, if applicable, after trail 2. Rater agreement was substantial with Cohen's kappa=.767. However, given the clear coding scheme, this appears to be a rather low agreement. In some cases, it was not clear from the videos whether two (or more) gears were meshing or not. We decided for these cases to always code as if the gears were meshing (i.e., to give the higher score).

Testing and optimising
We coded indications of testing and optimising for trial 1 and, if applicable, for trial 2. To account for the findings of Bairaktarova et al. (2011) andStrimel et al. (2018a), we distinguished between children's testing of incomplete prototypes (e.g., turning two meshed gears) and of complete prototypes (turning the driving gear to move the carousel) during the trial. Following the same logic, we distinguished between changes children made to incomplete and to complete prototypes during the trial. Cohen's kappa was .614; thus, according to Landis and Koch (1977), rater agreement can be regarded as substantial.  Note. When a child builds a complete prototype but then makes an alteration making it incomplete again at the 3 minutes mark of trial 1 (or 2 minutes mark of trial 2), it would be coded as an incomplete prototype.

Goal awareness
We recorded and coded whether the children were able to name specification 1 (the carousel should turn in the direction of the arrow) without the help of the interviewer, with help or not at all. We did the same for specification 2 (the driving gear has to be turned in the direction of the arrow). The final rater agreement was substantial with Cohen's kappa=.754. We calculated a score to quantify children's goal awareness. For each of the two specifications, the children received 2 points if they provided the correct answer on their own, 1 point if they provided the correct answer after the probing of the interviewer, and 0 points if they did not give a correct answer even after probing. Thus, the score ranged from 0 to 4 points.

Testing Domain-Specific Knowledge, Spatial Skills and Figural Reasoning
We tested children's domain-specific conceptual knowledge of gears' turning direction in a standardized oneto-one interview (gear-interview). The children were shown eight different gear configurations (items) built with physical gears. Each item consisted of a driving gear with an arrow indicating the turning direction that was meshed with one or more other gears (target gears). The children had to predict the turning direction of the target gears. All gears were fixed, that is, they could not rotate. This meant that the children could not learn during the test. Moreover, there was no teaching on gear functioning before. For a detailed description of the procedure see Authors. The internal consistency was high with αKD20=.912. The children could achieve a maximum score of 26 points.
We assessed mental rotation ability with the Picture Rotation Test (PRT) (Quaiser-Pohl, 2003) as an indicator for spatial skills. Children were presented with coloured three-dimensional target images of people (e.g., an ice-skater) and animals (e.g., a tiger) as well as three comparison images. Two of them were mirrored and rotated and one was only rotated. The children had to identify the rotated image of the target image. The maximum score was 16 points.
As an indicator of fluid intelligence, we measured figural reasoning with the matrices subtest from the Culture Fair Test (CFT) (Weiß and Osterland, 2013). The children could reach a maximum score of 15 points. All tests were conducted about one week before the engineering task to ensure adequate testing time and cognitive loads for the age group.

Correlational Analysis
For the correlational analysis, we considered the different measurement levels of the variables and whether the data were distributed normally or not (c.f. Field, 2011; see Table 3). We calculated Pearson's correlation coefficient r for pairs of variables that were normally distributed at the interval level. We calculated Spearman's correlation coefficient ρ for pairs of variables with one or both variables non-normally distributed at the interval level. We calculated the biserial correlation coefficient rb for pairs of variables where one of the variables was measured dichotomously but had an underlying continuum, and the other variable was normally distributed at the interval level. In cases of non-normally distributed data of the interval level variable, we calculated the biserial correlation coefficient based on Spearman's correlation coefficient ρ. We calculated the tetrachoric correlation coefficient rtet for pairs of dichotomously measured continuous variables.

RQ 1: Solution quality
The mean quality of children's solutions in trial 1 was 1.61 (SD=1.02, range=0-3). Nine children built a complete and adequate prototype, 14 children a complete but inadequate prototype. Eleven children had an incomplete prototype at the end of trial 1 (max. 3 minutes), and seven children had no prototype. For the subsample of the children who did trial 2 (n=23), the mean score slightly increased from 1.48 (SD=0.95) after trial 1 to 1.70 (SD=0.88) after trial 2. However, the mean scores of trial 1 and trial 2 did not differ significantly, t(22)=-1.417, p=.171, as a paired sample t-test revealed. Thus, only trial 1 will be considered in the further analyses.

RQ 2: Testing and optimising
We first analysed children's testing and optimising regardless of the completeness of the prototype. Thirty children tested at least once. Twenty-six children made at least one change. To examine whether children's testing behaviour differed for incomplete and complete prototypes, we restricted our analysis to children who finished trial 1 with a complete prototype (n=23). Sixteen of them tested the functioning of both incomplete and complete prototypes. Four of them tested the functioning of complete prototypes only, and one child the functioning of incomplete prototypes only. Two of them did not test at all. To examine whether children's optimising behaviour differed for incomplete and complete prototypes, we restricted our analysis to children who finished trial 1 with a complete, but inadequate prototype (n=14). Six of them made changes to both incomplete and complete prototypes, another six children made at least one change to an incomplete, but not to a complete prototype, and two of them did not make any changes.

RQ 3: Goal awareness
The mean score for goal awareness was 2.24 (SD=1.07, range=0-4). Five children (12%) provided a complete and correct description of both specifications on their own. 27 children (66%) provided an incomplete description of the task and did not mention the turning direction. However, with probing of the interviewer, they were able to give the complete description. Thus, there were a total of 32 children (78%) who could correctly name both specifications. The remaining nine children (22%) provided incomplete or no description of one or of both specifications, despite the interviewer's probing.

RQ 4: Self-evaluation
When asked by the interviewer, 31 children (75%) said that they were satisfied with their solution in trial 1.

RQ 5 and RQ 6: Task persistence
Sixteen children (39%) made full use of the time allowed at trial 1 and thus were interrupted by the interviewer after 3 minutes of engineering. Correspondingly, the other 25 children (61%) presented a solution to the interviewer before the 3 minutes were over. Their mean time-on-task was 114 seconds (SD=41.8). When asked by the interviewer, 23 children (56%) opted to make changes to their solution. Hence, these 23 children did trial 2. Nine of these children made full use of the two minutes that were allowed in trial 2. The other 14 children finished before the two minutes were over with a mean time-on-task of 76 seconds (SD=32.8). The mean time-on-task for the children who did trial 1 only was 134 seconds (SD=51.2), and it was 237 seconds (SD=58.4) for the children who did both trial 1 and trial 2.

RQ 7: correlations
The solution quality was positively related to testing, to optimising, to goal awareness and to children's mental rotation ability. Testing was positively related to optimising, and to time-on-task. Optimising was positively related to goal awareness and to time-on-task. The children's self-evaluation of their solution was negatively related to time-on-task. Moreover, self-evaluation was negatively related to the children's willingness to make a change to their solution. Mental rotation ability (M=9.78, SD=4.11) was positively correlated with figural reasoning ability (M=4.29, SD=2.72). Domain-specific content knowledge on gears' turning direction (M=11.49, SD=5.64) did not significantly correlate with any of the other variables. Table 3 gives an overview of the correlation matrix.
Our next analysis is concerned with the association between solution quality and all other variables (Table 4). However, with respect to content knowledge, the mean value does not adequately represent the concepts. Thus, we categorize the children as either having the correct concept (meshed gears turn in opposite directions), an incorrect concept (meshed gears turn in the same direction), or no apparent concept by using the binomial distribution (P26;1/2 [X=17]=8.43%, meaning that 17 out of 26 consistent responses have 8.43% probability of occurring by chance, i.e., by guessing). Children with at least 17 out of 26 correct predictions in the gear interview were considered as having the correct concept (n=9). Children with at least 17 out of 26 incorrect predictions were considered as having the incorrect concept (n=14). Children without a consistent answer pattern were considered as guessing, thus having no apparent concept (n=18).
Due to the small sample size, Fisher-Freeman-Halton's exact test was used to determine if there was a significant association between solution quality and the respective variable (Bortz and Lienert, 2008). In the case of a significant association, we performed post-hoc pairwise comparisons with Bonferroni corrected significance level of p<.008. There was a statistically significant association between solution quality and testing, χ 2 =10.942, p=.006. The higher the solution quality, the higher the proportion of children who tested at least once. Pairwise comparisons revealed that children with a complete and adequate prototype (3 points) differed significantly (Fisher's exact test, p=.005) from children with no prototype (0 points) in their testing behaviour (9/9 vs. 2/7). Furthermore, there was a statistically significant association between solution quality and optimising, χ 2 =9.776, p=.016. Pairwise comparisons revealed that children with a complete, but inadequate prototype (2 points) differed significantly (Fisher's exact test, p=.003) from children with no prototype (0 points) in their optimising behaviour (12/14 vs. 1/7). There was no statistically significant association between solution quality and any of the other variables (all p>.05). However, on a descriptive level, there was a U-shaped association between solution quality and self-evaluation: For both 3 points and 0 points, the proportion of children who were satisfied with their constructions was high (9/9 and 6/7), whereas for both 2 points and 1 point the proportion was comparably low (9/14 and 7/11). Likewise, on a descriptive level, an inverted U-shaped association could be observed between solution quality and task persistence: For both 3 points and 0 points, the proportion of children making full use of the three minutes was rather low (2/9 and 1/7), whereas for both 2 points and 1 point the proportion was comparably high (7/14 and 6/11). Furthermore, 10 out of 11 children with 1 point in solution quality wanted to make a change to their construction when asked by the interviewer after trial 1. For all other children, this proportion was lower with 43 %.
With respect to children's content knowledge of gears' turning direction, descriptive data show on the one hand: The better the solution quality, the greater the proportion of children with the correct concept. On the other hand, the data also show: Among the children with an adequate prototype (3 points), 3 out of 9 had the correct concept of gears' turning direction. Another 3 achieved the 3 points with an incorrect concept, and the remaining 3 children without an apparent concept. Table 4. Association between solution quality and testing and optimizing, goal awareness, self-evaluation, task persistence, and content knowledge Solution quality after trial 1

DISCUSSION
In our exploratory study, we attempted to learn more about kindergarten children's persistence, testing and optimising in an engineering task and how these behaviours are related to children's goal awareness, their selfevaluation of the solution, their domain-specific content knowledge, spatial skills, and fluid intelligence. Therefore, we had 4-to 7-year-old children work on a well-defined engineering task with more than one possible correct solution in the domain of gears. We rated the solution quality (RQ-1); observed the children's engineering behaviour for incidences of testing and optimising (RQ-2); and measured goal awareness (RQ-3), self-evaluation of the solution (RQ-4), and task persistence (RQ-5 and RQ-6). Moreover, we included the children's domainspecific content knowledge, mental rotation ability and figural reasoning ability and explored how all aspects were related to each other (RQ-7). The discussion section aims at integrating the results of the research questions and attempts to draw a coherent picture of the findings.

Testing, Optimising and Task Persistence
The majority of the children showed at least one indication of testing and optimising in the course of the engineering trial. As expected, both testing and optimising did correspond with a higher time-on-task and with increased solution quality. These findings are in line with studies showing that experts achieve better solutions than novices by going through more design iterations (Atman et al., 2007;Strimel et al., 2018b). Thirty-nine percent of the children made full use of the 3 minutes. Other than expected, there was no significant linear correlation between time-on-task and solution quality. Descriptive data showed that children who succeeded to build an adequate prototype did not necessarily make full use of the time. This indicates that the task was easy enough for these children. Moreover, children who did not succeed in the task at all did not make full use of the time as well. This might be an indicator that the task was not sufficiently interesting for these children (Crismond, 2001) or that they did not succeed in inhibiting other stimuli (Miyake et al., 2000). Approximately 50% of the children opted to make changes to their solution when explicitly asked by the interviewer. There was no significant correlation between willingness to make changes and solution quality. Moreover, as descriptive data showed, the proportion of children who wanted to make a change to their construction was highest among those with an incomplete prototype, whereas the proportion was low for children with a complete, but inadequate prototype. These findings suggest that children's testing and optimising mainly aims at the completion of one single construction rather than at constructing and evaluating different prototypes, which is in line with other studies (Bairaktarova et al., 2011;Strimel et al., 2018a). Interestingly, a substantial proportion of the children (43%) who built a correct prototype wanted to continue building and making changes. We cannot say from our data whether this reflects what Lucas et al. (2014) calls the engineering habit of mind of relentlessly trying to make things work better, or if the children just enjoyed building with the gears without pursuing such a goal.
We examined various aspects that might be related to children's persistence and their testing and optimising behaviour. We will discuss them in the following sections.

Goal awareness
As shown in a study by Kendall (2015), children had difficulties in recalling the two specifications on their own. Indeed, the majority of the children stated independently from the interviewer that the carousel should rotate when they turn the crank, but they specified the turning direction only when explicitly asked for. This suggests that when children were introduced to the task, they built an accurate mental representation of the goal state (Kintsch, 2007), but they might have 'simplified' this goal representation during the course of the 3-minute engineering phase (Karmiloff-Smith, 1979). Even after the probing of the interviewer, one out of five children did not provide a complete description of how the construction was supposed to work. These children may have built an incomplete initial mental model of the goal state, or may have difficulties in maintaining an adequate representation of the goal even with well-defined problems. This difficulty may be related to the limited capacity of working memory in 4to 7-year-old children (Diamond, 2013). Future studies should therefore examine the role of working memory and other executive functions for engineering in early childhood, e.g., using the EF touch battery (Willoughby et al., 2013). As expected, goal awareness was positively related to solution quality and optimising. Other than expected, we found no significant correlation between goal awareness and testing. This lack of correlation can be interpreted as children having tested regardless of a correct and complete goal representation.

Self-evaluation
Overall, the majority of the children were satisfied with their solution. Moreover, as expected, the children who were satisfied with their solution were less willing to make changes to their prototype when asked by the interviewer. However, there was no significant linear correlation between self-evaluation and solution quality.
Descriptive data rather showed indications of a u-shaped relation: both the children who built an adequate prototype and the children who finished trial 1 with constructions showing no evidence of a prototype were satisfied. In contrast, the proportion of satisfied children was lower among those who built an incomplete prototype or a complete but inadequate prototype. These findings are partly in line with research showing that self-evaluations of 4-to 7-year-olds might be unrealistically positive (Harter and Bukowski, 2015;Marsh et al., 2012;Oppermann et al., 2018). This bias in self-evaluation could be one reason why kindergarten children use little or no iterative approach when working on an engineering design task (Gold et al., 2020;Strimel et al., 2018a). However, as indicated by findings of Kendall (2015) and Lottero-Perdue and Tomayko (2020), children are willing to improve their solutions if the requirements are made clear to them and if they are instructed to test their solutions. In this respect, a methodological shortcoming of our study is the interviewer's question "Are you satisfied with your solution?", which might be too general. If the question was put in relation to the demanded task specifications (e.g., "Does your carousel turn in the demanded direction?" or "Compare your solution to the demanded requirements."), the children might be able to evaluate their solution in depth. However, since we stated the requirements a few seconds before, when probing for children's goal awareness, we assume that it was possible for the children to relate our question to the task specifications. Nevertheless, a future study could examine whether children come to a more realistic assessment if they are given an explicit criterion for evaluation (Crismond, 2001;Lottero-Perdue and Tomayko, 2020).

Domain-specific content knowledge
We found no significant correlations of domain-specific content knowledge with the other variables in our data. This result indicates that it was possible for the children to test and optimise and to succeed in building an adequate prototype, even without specific knowledge of meshed gears' turning directions. As can be seen in the descriptive data, among the children with the correct solution, one third of them had a correct, an incorrect or no apparent concept of gears' turning direction respectively. However, as Lottero-Perdue and Tomayko (2020) also note in their study, task characteristics might have elicited this result. In our study, it obviously was possible to solve the task in a short time. Exploring the gears and observing their behaviour thoroughly might have been sufficient to succeed. An indication of the relevance of this interpretation can be found in Schulz et al.'s (2007) findings that children are able to produce evidence to infer causal structures of a mechanism by thorough testing. However, the descriptive data also show that the better the solution, the greater the proportion of children with the correct concept on gears' turning direction. This is in line with studies indicating that prior-knowledge in the relevant science domain can contribute to better engineering processes and outcomes (Benjamin et al., 2010;Marcus et al., 2021). Future studies should therefore try to differentiate whether successful solutions are achieved through trial and error procedures or through the application of content knowledge. Moreover, upcoming research might investigate and compare kindergarten children's engineering behaviour in different science domains, e.g., the stability of block buildings (Weber et al., 2020), one-sided levers as simple machines (Leuchter and Naber, 2018), floating of objects (Hong and Diamond, 2012), and pursue two directions: (1) Studies should examine for different science domains whether kindergarten children can learn domain-specific content knowledge by being exposed to engineering design tasks (Lin et al., 2020;Penner et al., 1997). (2) Studies might compare children who received a training in the relevant domain-specific knowledge beforehand with children who did not receive a training (Benjamin et al., 2010;Marcus et al., 2021). This will provide insights into children's failure analysis and the role of domain-specific content knowledge for optimising.

Spatial skills and fluid intelligence
Mental rotation ability did positively correlate to figural reasoning ability. Moreover, mental rotation ability was positively related to solution quality, indicating that children with better spatial skills achieved better solutions. This is in line with research suggesting spatial skills to be positively related with success in STEM (Wai et al., 2009). However, our study showed no significant correlation between spatial skills and testing and optimising. This is in contrast to findings from observations with older children (Ramey and Uttal, 2017). Furthermore, figural reasoning was not significantly related to any of the other variables. These findings indicate that children can engage with the engineering design process from an early age regardless of their spatial skills and their fluid intelligence, which is in line with observational studies of young children's engineering behaviour (for an overview, see Gold et al., 2020).
Our results highlight some practical implications. With respect to engineering intervention programs for kindergarten children, our findings emphasize the important role of teachers. They should learn techniques that help children to tackle challenges they might face with respect to persisting, testing and optimising, such as asking the children whether they remember the goal, helping them to retrieve the goal specifications, stating that nobody succeeds in the first attempt, and motivating them to remain on track (Lottero-Perdue and Tomayko, 2020).

Limitations
First, in the present study, the test power was limited; thus, we might not have discovered relations in our sample that exist in the population. A larger sample size would also allow us to apply path models to describe the directed dependencies among the variables. Moreover, with a larger sample size it might be worthwhile to examine whether there are (latent) "engineering profiles" among young children, e.g., children that stand out by their talent for thinking up creative solutions, whereas others are better at a critical evaluation, testing and optimisation of a solution. Second, we used only one engineering task in one domain (gears) with one aspect (turning direction). Future studies could use problems on the relative turning speed of gears. Third, we interpreted children's goal awareness, self-evaluation, and task persistence from our observations of the problem-solving process and from the children's answers to the interviewer's questions. This procedure seems to be appropriate as a first step in order to obtain an impression of which aspects might be relevant for children's testing and optimising behaviour. However, future studies should use standardized and validated test instruments to assess children's executive functioning, self-evaluation, and general task persistence. Fourth, the procedure we used in the present study did not allow us to measure competencies developed or demonstrated in group settings, which are important both in school settings and engineering practice, where communication and teamwork are essential (Lucas et al., 2014). Future studies might therefore use group settings as well.
However, to date, not many studies have been conducted in this area and with this age group. Therefore, we assume that the knowledge gained from our exploratory study is worthwhile. We have given the children a welldefined problem, conducted and videotaped the problem-solving process in a standardized way, developed a coding scheme to analyse children's testing and optimising and examined control variables. Thus, we were able to relate persistence and testing and optimising to solution quality and to the control variables. With our study, we have contributed to the development of further studies to investigate young children's EDP competencies as a relevant part of early STEM learning.