About 3 results
Open links in new tab
  1. Does past_key_values be repeatedly compute? - GitHub

    Oct 19, 2023 · Hi! Attention sink is very amazing for llm. I am confuse about past_key_values in streaming-llm. In my image, past_key_values will be recompute in every new input. But I notice …

  2. streaming-llm/streaming_llm/pos_shift/modify_llama.py at main - GitHub

    query_states = apply_rotary_pos_emb_single (query_states, cos, sin, position_ids) ### if past_key_value is not None: # reuse k, v, self_attention key_states = torch.cat ( [past_key_value [0], …

  3. streaming-llm/data/mt_bench.jsonl at main - GitHub

    [ICLR 2024] Efficient Streaming Language Models with Attention Sinks - streaming-llm/data/mt_bench.jsonl at main · mit-han-lab/streaming-llm