👋 Hey RevOps friends — looking for some AI/data advice! We’re working on a project to normalize job titles into consistent “title buckets” for our CRM, and would love to tap into your collective wisdom on how to approach it. Here’s the idea: We have ~30,000 unique job titles from various lead sources, with many small variations pointing to the same kind of role. For example:
Dir. Career Counselling → Decision Maker
Director Career Counselling → Decision Maker
Dir, Career Counselling → Decision Maker
Career Counselling Director → Decision Maker
Our goal is to train an AI model that can automatically assign each new title we import to the right bucket, without relying on exact matches. Ideally, something like “Dir. Career Counsel” would still correctly map to “Decision Maker” based on the model’s training. We’re stuck on where to start:
Is there a preferred approach for this kind of title classification using AI?
Should we be looking at custom models, fine-tuning existing LLMs, or is there a more practical way?
What would the step-by-step process look like to build and train this, especially for a small team with limited ML experience?
Appreciate any pointers, similar projects you've tackled, or resources we should check out! 🙏
Where are you doing this roll-up/bucketing? CRM?
Yes, the end goal is to have the title bucket stored in Salesforce as a custom field on the contact or lead record. Ideally, the classification would happen before or during the import process, so that when we bring in lead lists (from sources like ZoomInfo, events, etc.), the job title is automatically mapped to the appropriate bucket. We’re open to where the AI processing happens — whether it’s part of a data pipeline (e.g., in Python or via a middleware like Workato, Tray.io, etc.), or potentially using Salesforce integrations like Flow, Apex, or a connected tool — as long as the final result is a clean, bucketed title field in Salesforce.
Not sure how big your team is, but For a small, data-savvy team starting out, a sensible approach is to begin with a zero-shot or pre-trained model to get immediate results, then consider evolving to a custom model if needed. The zero-shot approach will quickly show you how well an AI can bucket your titles and identify tricky cases. If the accuracy from zero-shot is satisfactory (often it will be quite high for common roles), you save a ton of effort. If you find certain categories are consistently confused, that’s a sign you might need a custom model or additional training data focusing on those distinctions – or perhaps simply refine your category definitions. In short, zero-shot gives a fast win and requires virtually no ML expertise, while a custom model might improve accuracy and reduce per-record cost in the long run, at the expense of upfront effort. Many teams might stick with a pre-trained solution until scale (or specific errors) justifies investing in a tailored model. Happy to help or discuss further this project.
Hey Jeffrey T., just wanted to share that I can help you build a lightweight AI automation for this using Make.com + OpenAI API. There is no need for complex models or third-party platforms. It’s cost-effective, easy to manage, and works great for mapping messy titles to clean buckets. Let me know if you’d be interested!
Jeffrey T. are you already using Clay? I assume not since you didn’t mention it. But if you are, you can use the “Map Job Title to Persona” action to map job title keywords to buyer roles. This is very easy and inexpensive. If your mappings change, you just update the action in the table and you’re good to go. Downside here would be if you have a very large volume of job title keywords and/or the mappings change frequently. If you don’t already use Clay, you can build essentially the same thing in a SFDC flow, which is the cheapest and easiest option. This is a very simple use case where AI would be overkill, cost you a ton by comparison, and be far more difficult and expensive to maintain over time (not to mention the x% of hallucinations that you’d need to handle)
