Back To Schedule
Monday, May 20 • 4:40pm - 5:00pm
Reinforcement Learning Based Incremental Web Crawling

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Current crawling engines face a challenge in keeping the data up to date. They need to keep checking every webpage, forum thread, social media handle, blogs & news for any updates. Some webpages update every few minutes and some not for months. We present an evolutionary learning framework for identifying the incremental changes on crawled webpage in a prioritized order, doing away with the "tabula rasa" view of learning. Our model learns heuristics based on the features from the webpage & frequency of updates. It generalizes on the past data and creates a prioritization threshold.


Vatsal Agarwal

Innoplexus AG
Vatsal leads artificial intelligence at Innoplexus AG, building cutting-edge technology for the pharmaceutical and life sciences industries. He works on the life sciences language-processing engine and the domain-wide ontology used in a variety of Innoplexus products & solutions.Vatsal... Read More →

Monday May 20, 2019 4:40pm - 5:00pm PDT
Stevens Creek Room

Attendees (4)