Hi Peter,
The scoring model might be easier to get based on your own experience. The real thing might be the model inventing confident reasons a lead looks good that don't survive a second look, so the first thing that matters is kind of forcing every reason to cite a real datapoint instead of just a vibe.
The question you're actually circling, using what converted to score what's new, is a new game, your own criteria. Most setups store the past, they don't learn from it. Logging this converted is ok. Getting the system to quietly fade the patterns that stopped working and surface the ones that still do, without you re-tuning it by hand, is a completely different thing to work with.
I went all the way down that rabbit hole to work with our own leads. We couldn't find a way to define our own rules/criteria for scoring leads as we wanted to.
What does your setup look like right now, and are you scoring the lead, the message, or both?