Cleaned Text Content
Full-body text in Markdown format (optimized for tokenizers)
Metadata Provenance
Original URL, Publish Date, Author, and Content Language
Structural Elements
Headings (H1–H6), Bullet points, and Table-to-JSON conversion
Knowledge Identifiers
Extraction of SKUs and GTINs for physical product knowledge graphs
Bringits AI-Ready Output
{
// Cleaned Text Content
"content_markdown": "## Fed raises rates by 25bps...",
// Metadata Provenance
"source_url": "https://reuters.com/...",
"publish_date": "2025-06-14T09:32:00Z",
"author": "John Smith",
"language": "en",
// Structural Elements
"headings": ["H1", "H2", "H3"],
"tables_as_json": true,
// Knowledge Identifiers
"sku": "AH8050-100",
"gtin": "00194501956887"
}