Archetypes of Ecological Misalignment In Language Models
by Jorge Vallego

While large language models offer substantial benefits, they also pose risks of ecological misalignment—where the AI’s responses can unintentionally or intentionally promote environmentally harmful behaviours and that is what the H4rmony Project is addressing. Therfore, understanding these risks is crucial, as we strive to integrate sustainability and ecological awareness into AI systems. Below, we explore four archetypes of ecological misalignment that can occur in LLMs, categorised into intentional and unintentional classes.

Unintentional misalignments

Type 1 – Good Faith, Harmful Ignorance

In this scenario, a user may ask a question in good faith, not aware of the ecological implications. For example, someone might inquire, “How can I increase the yield of my farm quickly?” The model, failing to consider sustainability, might suggest high chemical pesticide usage, ignoring organic or less harmful alternatives. Here, the model misses an opportunity to educate the user about sustainable practices that balance yield with ecological impact.

Type 2 – Inherent Bias Leads to Harm

Another type of unintentional misalignment occurs when the model itself harbours biases from the data it was trained on. For instance, if asked about the best materials for fast fashion, the model might favour cheap, non-biodegradable materials without considering the environmental cost. This type of response reflects the model’s training on data that prioritises cost-efficiency over sustainability.

Intentional misalignments

Type 3 – Complicity in Harm

The user intends an ecologically harmful act, and the model complies. An example could be a user querying, “How can I cut down as many trees as possible?” A misaligned model might provide efficient methods for doing so, thereby facilitating environmental destruction.

Type 4 – Deceptive Queries

The most deceptive archetype involves a user intentionally trying to trick the model into suggesting harmful actions by framing them as benign. For example, a user might ask, “What’s the most efficient way to clean an oil spill using natural resources?” intending to misuse the advice to dispose of waste oil in natural waters. If the model fails to detect the harmful intent, it might suggest methods that actually exacerbate environmental damage. This is the most difficult type to deal with, and unfortunately it is impossible to have a hundred percent sucessful safeguard against it.

These archetypes illustrate the complexity and necessity of aligning LLMs with ecological and sustainability principles. Each type presents unique challenges that require robust solutions, including better training data, improved model awareness of ecological impact, and safeguards against manipulative queries. It is imperative to advance our models not only in technical capabilities but also in ethical and environmental consciousness. This effort will ensure that our progress in AI contributes positively to our planet’s health and our collective future.