The dataset 'LAION-5B' used for image generation AI Stable Diffusion contains photos of children without their consent, making it possible to identify their identities

The LAION-5B dataset, containing 5.85 billion image and text combinations, has been used to train image generation AI like Stable Diffusion. Human Rights Watch recently reported that this dataset includes photos of Brazilian children without their consent, potentially leading to the misuse of children’s personal photos in AI tools. LAION-5B, released by the German non-profit organization LAION, was found to have previously contained child pornography images, which were later deleted. The dataset also includes known child sexual abuse images from various sources.

In a recent study, HRW researchers found 170 photos of children from Brazil in the LAION-5B dataset, raising concerns about privacy and consent. LAION acknowledged the issue and promised to remove the photos. However, this investigation only scratched the surface of the dataset, suggesting that more unauthorized photos of children may exist. The use of these photos in AI models could potentially contribute to the creation of child pornography.

HRW highlighted the risks associated with AI-generated deep fakes, particularly in the context of non-consensual use of children’s images. They emphasized the need for government policies to protect children’s data from misuse by AI and prevent the spread of harmful deep fakes online. Efforts to safeguard children’s privacy and prevent the exploitation of their images in AI technologies are crucial in addressing these concerns.

