My new Data Extraction Job at Screen Scraper

Well, I had this entry all finished, then accidentally pressed the back button on my mouse and lost it all. Very frustruating. So, I might not include all the details again in this version because I don't know if I feel like writing everything over.

I've been meaning to write about my new job for a while now. For the last month, I've been working at screen-scraper.com, the data extraction experts. At the end of last year, Todd Wilson sent me an email to see if I was interested in working with him. Todd used to work at TALL with me before quitting about a year ago to concentrate fully on this, his own business of helping other businesses extract data from the web. The primary abilities of the company so far have been to get out data from structured pages, in predictable formats like tables, lists, or forms. Well, recently, he has had the opportunity to work with unstructured data, more specifically in the form of real estate ads from online newspaper listings. If you any of you have been through the classifieds, you'll know how unstructured the data can be. You have to take something like this:

HOFF. ESTS. Cstm Hm 4br,2ba,2c gar,2K sf, mba,stone front,vltd ceil, Fremd, great location, 4989 Chambers Dr. $369K. 224-588-2410

and get out data like the phone, address, price, and number of bedrooms and bathrooms. It's an easy task for humans to do, but very difficult to automate with a computer. So, Todd took me to lunch and asked if I would come work for him to build up a framework for extracting unstructured data for this and future projects that might require it.

One reason he came to me is that I've been involved for the last year or two with a research group at BYU who have dealt with some of these same problems. This was one of the main things that attracted me to the job as well. I think it will be a nice thing to have in my back pocket, to be able to say "I did some research at BYU, then took some of those exact principles into the business world."

As excited I was for the new opportunity, I knew I couldn't just dump my job at TALL; I have my hand in too many pies right now and dropping everything would set the project back too much. I really like the purposes behind TALL, so I wouldn't want to hurt the work. So, I'm currently splitting my work hours and spending 15 hours a week at each job (plus a few more at screen scraper as I can fit them in). When I take on a bowling class in March I'll drop a few more hours at TALL, and eventually fade off into TALL history.

I've really enjoyed working at screen scraper so far; it's quite a different atmosphere to work so close to the customers and their needs (though it can be stressful at times). It has really been nice to feel like I've been having a strong effect on the business, being one of only eight employees.

Comments

Post new comment

The content of this field is kept private and will not be shown publicly.