The “household type” variable is available in the structural survey (SS), an annual survey carried out on a sample of approximately 300 000 persons. The results of the survey only make it possible to obtain direct and reliable estimates for spatial divisions of at least 15 000 persons. This number drops to around 3000 persons with data from the structural survey grouped together over 5 years (pooling). Results for the household typology are therefore not available exhaustively.
The aim of this project is to obtain a “household type” variable for all private households in the permanent resident population at the main place of residence. This makes available:
exhaustive household types at individual level as a variable for analysis in other areas (e.g. matching with other data to compare income from employment by household type);
results on household types at a precise spatial division (e.g. to obtain the distribution of household types by commune).
As in the structural survey, the household type in the Population and Households Statistics (STATPOP) is calculated on the basis of all the relationships between the different members of the household. The data sources are:
STATPOP: the STATPOP population contains, in particular, the relationships obtained from the computerised civil status register (Infostar) for persons having had a civil status event (marriage, birth,...) since the end of the 1990s, as well as relationships obtained from the information system of the Swiss Federal Department of Foreign Affairs for diplomats and international organisation staff (Ordipro);
Structural survey: a relationship table is created on the basis of information provided by respondents during surveys from previous years.
In addition to these two sources, deterministic algorithms were used to identify a certain number of additional relationships (e.g. two people of different sex, married on the same date and living in the same household, are defined as husband-wife). These procedures thus made it possible to attribute a household type to 86% of households.
Different approaches have been tested to impute missing relationships or missing household types for the remaining 14% of households: machine learning (random forest), deterministic algorithms developed by the Statistical Office of the Canton of Vaud, decision tree and multinomial regression. Performance studies and quality estimates have been conducted and identified the decision tree as the best approach in this context.
Two versions of the “household type” variable were created for STATPOP households from 2017 to 2020, one with an age limit of children up to 18 and the other up to 25.
Limitations of data
As the STATPOP household type variables are partly imputed, these are suitable as variables for analysis but are not necessarily reliable at individual level for imputed households. As these variables are still experimental, it must be mentioned in publications based on this information.
The FSO continues to test for improvements in the quality of imputations and the estimation of error. Apart from the decision tree approach, improvements remain possible with the random forest algorithm. The aim is to minimise differences between the results of the two “best” approaches in order to consolidate the results obtained. In the medium-term, it is planned to integrate the “household type” variable into the standard STATPOP production.
This methodological document is only available in French.