HAPPY + DeBERTa-based language model predicts polymer properties from SMILES. Each subgroup token's contribution is explained via Integrated Gradients.
Chemically Informed Language Model combining HAPPY tokens with per-subgroup RDKit/Mordred descriptors.
Polymer SMILES → HAPPY tokens. Each token = one chemical subgroup (4-char code).
256 RDKit/Mordred descriptors per subgroup injected alongside token embeddings.
Pretrained via MLM on 10,647 polymers, fine-tuned for property prediction.
Enter a polymer SMILES (* = chain-end attachment point). The model returns Tg (K) with per-subgroup contribution.
Precomputed Integrated Gradients over the polyinfo dataset. Filter by multiple properties simultaneously — only subgroups satisfying all conditions are shown. Hover a card to see full details.