tags. So for the HTML below as an example:
This is the first sentence.
This is the second sentence.
var container = document.getElementById("container"); var text = container.innerText || container.textContent; // the text I want
will return This is the first sentence.This is the second sentence. without a space between the first period and the start of the second sentence. My overall goal is to parse text using the Stanford CoreNLP, but its parser cannot detect that these are 2 sentences because they are not separated by a space. Is there a better way of extracting text from HTML such that the sentences are separated by a space character? The HTML I'm parsing will have the text I want mostly in tags, but the HTML may also contain , , and other tags embeeded between
tags.