Machine Learning SoSi (ML_SoSi)

| Last update: 24.09.2019

image – experimental statistics


The clustering of typical prospective trajectories patterns concerning the receipt of benefits in the social security system as well as employment and the estimation of cluster affiliation through the use of individual characteristics and retrospective process data applying a machine learning approach.


Based on a cohort of people entering social assistance and/or unemployment insurance, the benefits and employment trajectories can be observed (prospectively) for a certain period using SHIVALV+IK data. These combined trajectories will be described with data-driven processes and grouped together based on their similarity/dissimilarity (sequence clustering). Subsequently, using information that was available at the time persons entered the system, the probability of belonging to the trajectory clusters identified will be calculated for each person. This information includes structural variables such as age, sex, and household type, as well as information on employment and social benefit receipt prior to entering the system (retrospective). It is therefore an estimate of the probable case trajectory for a defined period starting from the moment of entering the system. A machine learning approach will be applied for this purpose: The model (features, method of estimation and its parameters for generalisation of data, and weights, etc.) will be inductively determined by means of systematic experimentation. On this basis, aggregated prognosis indicators for the frequency of certain trajectory patterns can be developed.


The institutional framework makes it possible for a variety of trajectories to enter the social security system and to leave it again, in other words to become economically independent again. The potential for estimating individual trajectories in the social security system by means of inductive methods will being assessed and tested for reliability. Models for the early identification of high-risk trajectory patterns, i.e. those of people who have very little chance of being reintegrated into employment, will be calculated. We will also assess to what extent the indicators calculated using this method can be repeated and published periodically.