Thursday, 12 September 2013

How to strip a part of the text obtained from web harvest

How to strip a part of the text obtained from web harvest

I am new to webharvest and am using it to get the article data from a
website, using the following statement:
let $text := data($doc//div[@id="articleBody"])
and this is the data that I get from the above statement :
The Refine Spa (Furman's Mill) was built as a stone grist mill along the
on a tributary of Capoolong Creek by Moore Furman, quartermaster general
of George Washington's army
Notable people
Notable current and former residents of Pittstown include:
My question is that, is it possible to remove the entire content which is
after "Notable people" using the configuration. Is it possible to do this
way? If its possible please let me know how. Thanks.

No comments:

Post a Comment