Spaces:
Build error
Build error
| title: Code Similarity Visualization with GraphCodeBERT | |
| emoji: 🧠 | |
| colorFrom: gray | |
| colorTo: blue | |
| sdk: gradio | |
| sdk_version: 5.38.0 | |
| app_file: app.py | |
| pinned: false | |
| license: mit | |
| short_description: Augmenting the Interpretability of GraphCodeBERT | |
| # Code Similarity Visualization with GraphCodeBERT | |
| This interactive application visualizes token-level embeddings generated by [GraphCodeBERT](https://huggingface.co/microsoft/graphcodebert-base) for classical sorting algorithms. It supports pairwise comparison of algorithms based on their representation in the model’s embedding space, using PCA for dimensionality reduction. | |
| ## ✒️ Reference | |
| Martinez-Gil, J. (2025). | |
| **Augmenting the Interpretability of GraphCodeBERT for Code Similarity Tasks**. | |
| *International Journal of Software Engineering and Knowledge Engineering*, 35(05), 657–678. | |
| ## 🚀 Features | |
| - Select two classical sorting algorithms. | |
| - Automatic tokenization and embedding via GraphCodeBERT. | |
| - PCA-based projection into 2D space for visualization. | |
| - Clear matplotlib plots showing token-level distribution differences. | |
| ## 🧠 Technical Overview | |
| - **Model**: [`microsoft/graphcodebert-base`](https://huggingface.co/microsoft/graphcodebert-base) | |
| - **Embedding Layer**: Last hidden state | |
| - **Reduction**: Principal Component Analysis (PCA) | |
| - **Interface**: Gradio | |
| - **Languages**: Python 3.10+ | |
| ## 🛠 Dependencies | |
| All required libraries are listed in `requirements.txt`: | |
| ``` | |
| transformers | |
| torch | |
| scikit-learn | |
| numpy | |
| matplotlib | |
| gradio | |
| Pillow | |
| ``` | |
| ## 🖥️ Intended Use | |
| - Academic teaching and demonstration of code embeddings | |
| - Qualitative evaluation of pretrained models for source code | |
| - Supplementary visualization for software engineering publications | |
| ## 📬 Contact | |
| **Jorge Martinez-Gil** | |
| Senior Research Scientist in Computer Science |