About this Talk
Accurately modeling module relationships in a codebase is crucial for transitioning manual development tasks into agentic workflows. This session presents research on leveraging graph-based representations of source code to create structured, persistent memory of codebases for developer-agents.
We will examine the value of systematic extraction and parsing of code relationships using Abstract Syntax Trees (ASTs) for extracting relational metadata, as-well as pandas-based data aggregation methods for managing large-scale relational data. Further, we discuss how graph theory techniques including centrality measures and clustering (implemented with NetworkX) can be employed to identify critical software modules and dependencies.
This session references and builds upon previous research, including RepoAgent (arXiv:2402.16667), ContextModule (arXiv:2412.08063), and open-source projects such as Potpie and Blarify, highlighting differences, strengths, and areas for improvement.
Last, we'll explore the integration of the memory model with an agentic architecture focusing on fine-tuning and parameter stabilization for consistent module summarization to improve agent-to-agent communication.
- Learn precise methodologies to systematically extract and represent semantic and syntactic relationships within complex codebases using graph theory and AST-derived approaches.
- Understand strategies to mitigate variability in model outputs, ensuring consistent results.
- Practical Applications: Empirical evaluations, empirical evaluations, including reproducibility challenges and solutions.