Forward Model — Property Prediction

Predict
Polymer Properties

HAPPY + DeBERTa-based language model predicts polymer properties from SMILES. Each subgroup token's contribution is explained via Integrated Gradients.

Architecture

CI-LLM Forward Model

Chemically Informed Language Model combining HAPPY tokens with per-subgroup RDKit/Mordred descriptors.

🧩

HAPPY Tokenizer

Polymer SMILES → HAPPY tokens. Each token = one chemical subgroup (4-char code).

768 base subgroups + 100 FORGE
📐

Descriptor Injection

256 RDKit/Mordred descriptors per subgroup injected alongside token embeddings.

256 descriptors × each token
🤖

DeBERTa Encoder

Pretrained via MLM on 10,647 polymers, fine-tuned for property prediction.

MLM → fine-tune → predict
SMILES
HAPPY tokens
+
descriptors
DeBERTa
Tg prediction
Interactive Demo

Predict Glass Transition Temperature

Enter a polymer SMILES (* = chain-end attachment point). The model returns Tg (K) with per-subgroup contribution.

Examples:
Explainability

Subgroup Contribution Explorer

Precomputed Integrated Gradients over the polyinfo dataset. Filter by multiple properties simultaneously — only subgroups satisfying all conditions are shown. Hover a card to see full details.

Filter
Tg
Eg
Tm
ρ
Sort
Loading IG data…