Removing HTML Markup from Text
When parsing HTML content, it’s often necessary to remove markup entirely. The handler in Listing 34-1 removes all HTML tags from the text provided, returning only the remaining text—the contents of the tags.
APPLESCRIPT
on removeMarkupFromText(theText)
set tagDetected to false
set theCleanText to ""
repeat with a from 1 to length of theText
set theCurrentCharacter to character a of theText
if theCurrentCharacter is "<" then
set tagDetected to true
else if theCurrentCharacter is ">" then
set tagDetected to false
else if tagDetected is false then
set theCleanText to theCleanText & theCurrentCharacter as string
end if
end repeat
return theCleanText
end removeMarkupFromText
Listing 34-2 shows how to call the handler in Listing 34-1.
APPLESCRIPT
set theText to "<a href=\"http://www.apple.com/mac\">This is a <B>great</B> time to own a Mac!</a>"
removeMarkupFromText(theText)
--> Result: "This is a great time to own a Mac!"
Copyright © 2018 Apple Inc. All rights reserved. Terms of Use | Privacy Policy | Updated: 2016-06-13