Module: Assistant::BlockAddressedEditor

Defined in:
app/services/assistant/block_addressed_editor.rb

Overview

Stage 2 of the Sunny blog editor fix: replace fragile HTML find/replace
with stable block-ID addressing, Kadous's "edit trick" adapted for HTML.

The LLM sees a block_index like:
b_a3f2 Real estate in Los Angeles...
b_b91c The Project
b_c44d

And emits ordered operations by block_id:
replace_block(block_id: "b_a3f2", html: "...")
replace_in_block(block_id: "b_a3f2", find: "/old-path", replace: "/new-path")
delete_block(block_id: "b_b91c")
insert_after(block_id: "b_c44d", html: "...")
insert_before(block_id: "b_a3f2", html: "...")
move_block(block_id: "b_c44d", after: "b_a3f2")
move_block(block_id: "b_c44d", before: "b_a3f2")
update_attr(block_id: "b_c44d", attr: "data-id", value: "10511")

Block addressing is by ID, never whole-document string match — eliminating
the ~40% patch-failure rate caused by whitespace/attribute drift in
patch_blog_post. replace_in_block is the one substring-based op, but it is
SCOPED to a single addressed block's inner HTML, so a literal find/replace
cannot mismatch elsewhere in the document the way patch_blog_post did. It
lets the model fix links / phrases inside a large block (e.g. a )
without reproducing the whole block HTML — the failure mode behind the
truncated-HTML edit loop in convs 3105/3109.

Constant Summary collapse

BLOCK_ID_ATTR =

Block id attr.

'data-block-id'
BLOCK_ID_PATTERN =

Regex pattern matching block id.

/\Ab_[a-f0-9]{8}\z/
VALID_OPS =

Valid ops.

%w[replace_block replace_in_block delete_block insert_after insert_before move_block update_attr].freeze
ADDRESSABLE_TAGS =

Tags eligible to receive a block_id. Inline tags and whitespace text
nodes are skipped. We intentionally include and
because some legacy posts wrap content in those.

%w[
  p h1 h2 h3 h4 h5 h6 ul ol blockquote figure pre table
  div section aside article hr dl
].to_set.freeze

Class Method Summary collapse

Class Method Details

.apply_ops(html, ops, on_op: nil) ⇒ Hash

Apply ordered block-ID operations to HTML.

Parameters:

  • html (String)

    Source HTML (must already have block IDs assigned).

  • ops (Array<Hash>)

    Ordered operations.

  • on_op (Proc, nil) (defaults to: nil)

    Optional callback invoked per op with
    { op:, block_id:, status:, preview: } — used by Stage 9 streaming.

Returns:

  • (Hash)

    { html:, op_results: [...] }
    Each op_result: { index:, op:, block_id:, status:, detail: (optional) }
    status: "applied" | "not_found" | "invalid" | "duplicate_id"



119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
# File 'app/services/assistant/block_addressed_editor.rb', line 119

def apply_ops(html, ops, on_op: nil)
  ops_array = Array(ops)
  return { html: html.to_s, op_results: [] } if ops_array.empty?

  fragment = parse_fragment(html.to_s)
  valid_ids = collect_block_ids(fragment)
  results = []

  ops_array.each_with_index do |op, idx|
    op_name = (op[:op] || op['op']).to_s
    block_id = (op[:block_id] || op['block_id']).to_s
    entry = { index: idx, op: op_name, block_id: block_id }

    unless VALID_OPS.include?(op_name)
      entry[:status] = 'invalid'
      entry[:detail] = "Unknown op: #{op_name.inspect}. Valid ops: #{VALID_OPS.join(', ')}"
      results << entry
      notify(on_op, entry, fragment, nil)
      next
    end

    unless BLOCK_ID_PATTERN.match?(block_id)
      entry[:status] = 'invalid'
      entry[:detail] = "Invalid block_id format: #{block_id.inspect}. Must match /b_[a-f0-9]{8}/. " \
                       "Re-call get_blog_post and copy the block_id verbatim — never invent one."
      assign_did_you_mean!(entry, block_id, valid_ids)
      results << entry
      notify(on_op, entry, fragment, nil)
      next
    end

    target = find_block(fragment, block_id)
    unless target
      entry[:status] = 'not_found'
      entry[:detail] = "No block with data-block-id=#{block_id} found in content. " \
                       "Re-call get_blog_post to refresh the block_index — IDs change after every edit."
      assign_did_you_mean!(entry, block_id, valid_ids)
      results << entry
      notify(on_op, entry, fragment, nil)
      next
    end

    preview_node = nil
    begin
      preview_node = apply_single_op(fragment, target, op_name, op, entry)
    rescue ArgumentError => e
      entry[:status] = 'invalid'
      entry[:detail] = e.message
    end

    results << entry
    notify(on_op, entry, fragment, preview_node)
  end

  { html: serialize(fragment), op_results: results }
end

.assign_ids!(html) ⇒ String

Parse HTML, assign a stable b_<8hex> data-block-id to every top-level
child that doesn't already have one, and return the serialized HTML.
Idempotent — existing valid IDs are preserved.

Parameters:

  • html (String)

    The blog post body HTML.

Returns:

  • (String)

    HTML with data-block-id on every top-level block.



57
58
59
60
61
62
63
64
65
66
67
68
# File 'app/services/assistant/block_addressed_editor.rb', line 57

def assign_ids!(html)
  return '' if html.nil? || html.to_s.strip.empty?

  fragment = parse_fragment(html.to_s)
  top_level_blocks(fragment).each do |node|
    existing = node['data-block-id'].to_s
    next if BLOCK_ID_PATTERN.match?(existing)

    node[BLOCK_ID_ATTR] = new_block_id
  end
  serialize(fragment)
end

.block_html(html, block_id) ⇒ String?

Return the full, untruncated outer HTML of a single addressed block, or
nil when block_id is malformed or absent. Fetched on demand and used
immediately, so — unlike get_blog_post's solution field — it is immune
to the mid-turn context compaction that truncates large bodies down to a
few hundred chars on later turns (the <figcaption class="figu… cutoff
that stalled convs 3105/3109). Lets the model read a block's exact
current markup before a replace_block / replace_in_block edit.

Parameters:

  • html (String)

    Serialized post body HTML (already block-ID assigned).

  • block_id (String)

    Target block_id (format: b_xxxxxxxx).

Returns:

  • (String, nil)

    The block's outer HTML, or nil if not found.



102
103
104
105
106
107
108
# File 'app/services/assistant/block_addressed_editor.rb', line 102

def block_html(html, block_id)
  return nil if html.nil? || html.to_s.strip.empty?
  return nil unless BLOCK_ID_PATTERN.match?(block_id.to_s)

  fragment = parse_fragment(html.to_s)
  find_block(fragment, block_id.to_s)&.to_html
end

.block_index(html) ⇒ Array<Hash>

Build a compact index of top-level blocks with a short preview.
This is what the LLM consumes to choose block_ids to target —
much smaller than the 40k-char truncated full body.

Parameters:

  • html (String)

    Serialized HTML (already passed through assign_ids!).

Returns:

  • (Array<Hash>)

    [{ block_id:, tag:, preview:, kind: (optional), embed_id: (optional) }]



76
77
78
79
80
81
82
83
84
85
86
87
88
89
# File 'app/services/assistant/block_addressed_editor.rb', line 76

def block_index(html)
  return [] if html.nil? || html.to_s.strip.empty?

  fragment = parse_fragment(html.to_s)
  top_level_blocks(fragment).map do |node|
    entry = {
      block_id: node[BLOCK_ID_ATTR].to_s,
      tag: node.name
    }
    entry[:preview] = build_preview(node)
    decorate_embed_metadata!(entry, node)
    entry
  end
end