I am absolutely new to AI/ML and need some guidance/direction.
Every "New to AI, try this" guide I find ends up going down a path that isn't right for the project I'm working on - or convoluted with so many terms I need to look up, I get rather frustrated. Maybe I'm too old to learn/use AI? Anyway . . .
This is my project, and any guidance, pointers, help would be super appreciated. I'm working on a job aggregator. I have a simple web crawler that goes to a url, fetches the HTML, cleans a lot of the text and structure, and outputs the content of the job posting.
I then go in manually, look at that simplified HTML and extract the actual job description (vs Company description, benefits, other stuff on a job posting) to be used in another database. I use the exact wording, straight copy and paste, no summarization or interpretation.
I have about 400 data points in a database that look like this:
job_site: "COMPANY_NAME",
raw_html: "Job TitleThis is what we doWe are looking for someone who"
job_description: "We are looking for someone who"
That I've manually extracted. I feel like I can use that as training data to do some form of text . . . extraction ?? . . . from an html document. But I don't have any clue on where to start
Thanks for this! I'll start learning!
A friend mentioned I should start with a pre-trained model because 400 (and growing 50ish / week with my crawler) is just not nearly enough. Then do continued learning on that pre-trained model. Does that sound right?