Spaces:

MusoraProductDepartment
/

Sentiment_analysis

Sleeping

App Files Files Community

Sentiment_analysis / visualization /README.md

Danialebrat

Adding Learning path page and improving HelpScout dashboard

599973c 12 days ago

preview code

raw

history blame contribute delete

23 kB

A newer version of the Streamlit SDK is available: 1.58.0

Upgrade

Musora Sentiment Analysis Dashboard

A Streamlit dashboard for visualising sentiment analysis results from social media comments (Facebook, Instagram, YouTube, Twitter), the Musora internal app, HelpScout customer support conversations, and Learning Paths lesson engagement data across brands (Drumeo, Pianote, Guitareo, Singeo, Playbass).

Project Structure
How Data Flows
Data Loading Strategy
Pages
Global Filters & Session State
Snowflake Queries
Authentication
PDF Reports
AI Agents
Adding or Changing Things
Running the App
Configuration Reference

Project Structure

visualization/
├── app.py                              # Entry point — routing, sidebar, session state
├── config/
│   └── viz_config.json                 # Colors, query strings, dashboard settings
├── data/
│   ├── data_loader.py                  # Comment Snowflake queries and caching
│   └── helpscout_data_loader.py        # HelpScout Snowflake queries and caching
├── utils/
│   ├── auth.py                         # Login page, authentication helpers
│   ├── data_processor.py               # Pandas aggregations (intent dist, content summary, etc.)
│   ├── metrics.py                      # KPI calculations (sentiment score, urgency, etc.)
│   ├── pdf_exporter.py                 # DashboardPDFExporter (comment dashboard PDF)
│   ├── helpscout_utils.py              # Pure helpers: parse_topics, explode_topics, boolean_flag_counts
│   └── helpscout_pdf.py                # HelpScoutDashboardPDF + HelpScoutAnalysisPDF
├── components/
│   ├── dashboard.py                    # Comment Dashboard page renderer
│   ├── sentiment_analysis.py           # Sentiment Analysis page renderer
│   ├── reply_required.py               # Reply Required page renderer
│   ├── helpscout_dashboard.py          # HelpScout Dashboard page + compact summary widget
│   └── helpscout_analysis.py           # HelpScout Analysis page (filter→fetch→charts→LLM→PDF)
├── visualizations/
│   ├── sentiment_charts.py             # Plotly sentiment chart functions
│   ├── distribution_charts.py          # Plotly distribution / heatmap / scatter functions
│   ├── demographic_charts.py           # Plotly demographic chart functions
│   ├── content_cards.py                # Streamlit card components (comment + content cards)
│   └── helpscout_charts.py             # HelpScoutCharts Plotly factory (16 chart types)
├── agents/
│   ├── base_agent.py                   # BaseVisualizationAgent (shared interface)
│   ├── content_summary_agent.py        # AI analysis for comment content summarisation
│   └── helpscout_summary_agent.py      # HelpScoutSummaryAgent — page-level LLM summary from SUMMARY fields
├── img/
│   └── musora.png                      # Sidebar logo
└── SnowFlakeConnection.py              # Snowflake connection wrapper (Snowpark session)

How Data Flows

Snowflake
    │
    ├── data_loader.py (SentimentDataLoader)
    │       ├── load_dashboard_data()        ──► st.session_state['dashboard_df']
    │       │                                       └─► sidebar (filter options, counts)
    │       │                                       └─► dashboard.py (all charts)
    │       ├── load_sa_data()               ──► st.session_state['sa_contents', 'sa_comments']
    │       │   (on-demand, Fetch button)           └─► sentiment_analysis.py
    │       └── load_reply_required_data()   ──► st.session_state['rr_df']
    │           (on-demand, Fetch button)           └─► reply_required.py
    │
    └── helpscout_data_loader.py (HelpScoutDataLoader)
            ├── load_dashboard_data()        ──► st.session_state['helpscout_df']
            │                                       └─► helpscout_dashboard.py
            │                                       └─► dashboard.py (compact summary)
            └── load_analysis_data()         ──► st.session_state['hs_analysis_df']
                (on-demand, Fetch button)           └─► helpscout_analysis.py

Key principle: Data is loaded as little as possible, as late as possible.

Dashboard queries are lightweight (no text columns, no content join) and cached 24 hours.
Sentiment Analysis, Reply Required, and HelpScout Analysis pages wait for the user to click Fetch Data.
All data lives in st.session_state so page navigation and widget interactions never re-trigger Snowflake queries.

Data Loading Strategy

Comment data (`data/data_loader.py` — `SentimentDataLoader`)

`load_dashboard_data()`

Fetches only: comment_sk, content_sk, platform, brand, sentiment_polarity, intent, requires_reply, detected_language, comment_timestamp, processed_at, author_id.
No text columns, no DIM_CONTENT join.
Merges demographics data if demographics_query is configured.
Cached 24 hours. Called once at startup; stored in st.session_state['dashboard_df'].

`load_sa_data(platform, brand, top_n, min_comments, sort_by, sentiments, intents, emotions, date_range)`

Runs two Snowflake queries:
1. Content aggregation — groups by content_sk, counts per sentiment, computes severity score, returns top N.
2. Sampled comments — up to 50 per sentiment group per content (QUALIFY ROW_NUMBER() <= 50). display_text computed in SQL.
Returns (contents_df, comments_df). Cached 24 hours.

`load_reply_required_data(platforms, brands, date_range)`

Filters REQUIRES_REPLY = TRUE. Conditionally includes the social media table and/or musora table. Cached 24 hours.

SQL column qualification note

The social media table and DIM_CONTENT share column names. Any WHERE clause inside a query that joins them must use the table alias prefix (e.g. s.PLATFORM, s.COMMENT_TIMESTAMP) to avoid Snowflake ambiguous column name errors.

HelpScout data (`data/helpscout_data_loader.py` — `HelpScoutDataLoader`)

`load_dashboard_data()`

Lightweight query from SOCIAL_MEDIA_DB.ML_FEATURES.HELPSCOUT_CONVERSATION_FEATURES.
Columns: conversation_id, status, source, created_at, updated_at, duration_hours, sentiment_polarity, topics, is_refund_request, is_cancellation, is_membership, customer_email.
Merges demographics (age/timezone/experience) via email join (LOWER(customer_email) = LOWER(usora_users.email)).
After the demographics merge, adds is_member boolean: True when the customer email matched a Musora user record, False otherwise.
Cached 24 hours. Stored in st.session_state['helpscout_df'].

`load_analysis_data(date_start, date_end, topics, sentiments, statuses, sources, is_refund, is_cancellation, is_membership)`

Adds summary, sentiment_notes, topic_notes, customer_first_name, customer_last_name columns.
SQL WHERE pushdown for all filters; multi-label topic filter uses ARRAY_CONTAINS('topic_id'::VARIANT, SPLIT(TOPICS, ',')).
Cached 24 hours keyed on filter tuple. Stored in st.session_state['hs_analysis_df'].

`get_filter_options(df)`

Returns sentiments, topics (exploded and label-mapped from taxonomy), statuses, states, sources.

Pages

The app has 5 pages navigated via the sidebar radio:

1. Sentiment Dashboard (`components/dashboard.py`)

Receives: filtered_df — lightweight comment dataframe (after optional global filter from app.py).

Key sections:

Summary stats + health indicator
Sentiment distribution (pie + gauge)
Sentiment by brand and platform (stacked + percentage bar charts)
Intent analysis (bar + pie)
Emotion analysis (bar + pie) — only when emotions column is non-null
Brand–Platform heatmap
Reply requirements + urgency breakdown
Demographics (age, timezone, experience) — only when demographics were merged
HelpScout compact summary — appended at bottom; reads st.session_state['helpscout_df'] directly (guarded by try/except so failures never break the main dashboard)

2. Custom Sentiment Queries (`components/sentiment_analysis.py`)

Receives: data_loader instance only.

Flow:

Reads st.session_state['dashboard_df'] for filter option lists.
Pre-populates platform/brand dropdowns from st.session_state['global_filters'].
On Fetch Data: calls data_loader.load_sa_data(...), stores results in st.session_state['sa_contents'] and ['sa_comments'].
Renders content cards, per-content sentiment + intent + emotion charts, AI analysis buttons, sampled comment expanders.

Pagination: st.session_state['sentiment_page'] (5 contents per page). Reset on new fetch.

3. Reply Required (`components/reply_required.py`)

Receives: data_loader instance only.

Flow:

Pre-populates platform/brand/date from st.session_state['global_filters'].
On Fetch Data: calls data_loader.load_reply_required_data(...), stores result in st.session_state['rr_df'].
Shows urgency breakdown, in-page filters (applied in Python, no extra query), paginated comment cards, and "Reply by Content" summary.

Pagination: st.session_state['reply_page'] (10 comments per page). Reset on new fetch.

4. HelpScout Dashboard (`components/helpscout_dashboard.py`)

Receives: helpscout_loader instance.

Reads from: st.session_state['helpscout_df'] (loaded at app startup).

Key sections:

Member status filter (radio at top): "All Customers / Members Only / Non-Members Only" — filters the entire dashboard view before any section renders
PDF export button (HelpScout Dashboard PDF)
6 KPI metrics: total conversations, average duration, refund requests, cancellations, negative rate, membership joins
Sentiment distribution (pie + bar)
Topic distribution and sentiment heatmap (from process_helpscout/config_files/topics.json taxonomy)
Boolean flags (refund, cancellation, membership) breakdown
Status and source breakdown
Timelines expander (daily conversation volume, refund/cancel trend)
Depth expander (topic co-occurrence, escalation funnel)
Member vs Non-Member section: KPI metrics (member count, non-member count, email match rate) + member share pie chart + sentiment by member status stacked bar + top topics by member status grouped bar
Demographics (age, timezone, experience)

Note: Global sidebar filters (brand, platform, sentiment, date) do not apply to HelpScout pages — HelpScout is brand-agnostic and uses its own filter panel.

5. HelpScout Analysis (`components/helpscout_analysis.py`)

Receives: helpscout_loader instance.

Flow:

Filter panel — date range, top_n, topics (multi-select with human-readable labels), sentiments, statuses, sources, 3 boolean checkboxes (refund / cancellation / membership), and a "Customer Type" selectbox (All / Members Only / Non-Members Only).
Fetch Data button — calls helpscout_loader.load_analysis_data(...), stale-checked via fetch_key tuple. The Customer Type filter is not part of the Snowflake query — it is applied in Python after fetching, using the member email set derived from st.session_state['helpscout_df'].
KPI row + distribution charts (sentiment, topics, flags, status).
Member vs Non-Member section — always rendered when member data is available; shows share pie, sentiment stacked bar, and top-topics grouped bar split by member status.
AI Summary section:
- "Generate AI Summary" button → calls HelpScoutSummaryAgent, stores result in st.session_state['hs_analysis_summary'].
- Renders: executive summary, top themes, top complaints, unexpected insights, notable quotes.
- "Export Analysis PDF" button → generates HelpScoutAnalysisPDF.
Paginated conversation cards — 10 per page; each card shows customer name, status, topics (label-mapped), summary, sentiment/topic notes.
CSV export button.

Pagination: st.session_state['hs_analysis_page']. Reset on new fetch.

Date range default: Clamps to max(min_date, max_date − default_date_range_days) so the default is always within the available data window.

Global Filters & Session State

Global filters apply only to comment pages (Dashboard, Sentiment Analysis, Reply Required). They have no effect on HelpScout pages.

st.session_state['global_filters'] = {
    'platforms':  ['facebook', 'instagram'],
    'brands':     ['drumeo'],
    'sentiments': [],
    'date_range': (date(2025, 1, 1), date(2025, 12, 31)),  # or None
}

Full session state key reference

Key	Set by	Used by
`dashboard_df`	`app.py` startup	sidebar, dashboard.py, SA + RR filter lists
`global_filters`	sidebar "Apply Filters"	app.py (dashboard filter), SA + RR pre-populate
`filters_applied`	sidebar buttons	app.py
`sa_contents`	SA fetch button	sentiment_analysis.py
`sa_comments`	SA fetch button	sentiment_analysis.py
`sa_fetch_key`	SA fetch button	SA stale-check
`rr_df`	RR fetch button	reply_required.py
`rr_fetch_key`	RR fetch button	RR stale-check
`sentiment_page`	SA page / fetch	SA pagination
`reply_page`	RR page / fetch	RR pagination
`content_summaries`	SA AI buttons	SA AI analysis display
`helpscout_df`	`app.py` startup	helpscout_dashboard.py (includes `is_member`), dashboard.py compact summary, helpscout_analysis.py member filter
`hs_analysis_df`	HS Analysis fetch	helpscout_analysis.py charts + cards
`hs_analysis_fetch_key`	HS Analysis fetch	HS Analysis stale-check
`hs_analysis_filter_desc`	HS Analysis fetch	human-readable filter string for PDF + agent
`hs_analysis_summary`	"Generate AI Summary"	HS Analysis summary renderer
`hs_analysis_summary_key`	"Generate AI Summary"	invalidated on re-fetch
`hs_analysis_page`	HS Analysis page / fetch	HS Analysis pagination

Snowflake Queries

Comment tables

Table	Platform	Notes
`SOCIAL_MEDIA_DB.ML_FEATURES.COMMENT_SENTIMENT_FEATURES`	facebook, instagram, youtube, twitter	Needs `LEFT JOIN DIM_CONTENT` for `PERMALINK_URL`
`SOCIAL_MEDIA_DB.ML_FEATURES.MUSORA_COMMENT_SENTIMENT_FEATURES`	musora_app	Has `PERMALINK_URL` and `THUMBNAIL_URL` natively

HelpScout table

Table	Notes
`SOCIAL_MEDIA_DB.ML_FEATURES.HELPSCOUT_CONVERSATION_FEATURES`	One row per conversation; multi-label topics in comma-separated `TOPICS` column

Static queries (in `viz_config.json`)

Key	Purpose
`dashboard_query`	Lightweight comment query — no text, no DIM_CONTENT join
`demographics_query`	Joins `usora_users` + `preprocessed.users` for age/timezone/experience
`helpscout.dashboard_query`	Lightweight HelpScout query (no SUMMARY/notes)
`helpscout.demographics_query`	Same demographics join, keyed on `customer_email`

Dynamic queries (built in `helpscout_data_loader.py`)

Method	Description
`_build_analysis_query()`	Full HelpScout query including SUMMARY/notes; multi-label topic filter via `ARRAY_CONTAINS`

Authentication

Module: utils/auth.py

AUTHORIZED_EMAILS allowlist + APP_TOKEN env var.
render_login_page() renders the login form and calls st.stop() when not authenticated.
Gate is placed at the top of app.py (after st.set_page_config, before data loaders).
Current user and logout button are shown in the sidebar.

Required env vars:

APP_TOKEN=<shared token>

PDF Reports

Comment Dashboard PDF (`utils/pdf_exporter.py` — `DashboardPDFExporter`)

Generated from the "Export PDF Report" expander at the top of the Dashboard page.

Sections: cover, executive summary, sentiment, brand, platform, intent, cross-dimensional, volume, reply requirements, demographics (optional), language (optional), HelpScout summary (if data loaded), data summary.

HelpScout Dashboard PDF (`utils/helpscout_pdf.py` — `HelpScoutDashboardPDF`)

Generated from the HelpScout Dashboard page. Sections: cover, KPI summary, sentiment, topics, flags & escalation, status & source, timelines, depth, member vs non-member (metrics + pie + sentiment bar + topic grouped bar), demographics.

HelpScout Analysis PDF (`utils/helpscout_pdf.py` — `HelpScoutAnalysisPDF`)

Generated from the "Export Analysis PDF" button on the HelpScout Analysis page (only available after an AI Summary has been generated).

Sections: cover, filter summary, KPI summary (including member/non-member counts when available), chart snapshots, member vs non-member breakdown (pie + sentiment bar + topic grouped bar), AI summary (executive summary, top themes, top complaints, unexpected insights, notable quotes), conversation cards sample, metadata.

Dependencies: fpdf2, kaleido (for Plotly PNG rendering at 3× scale).

AI Agents

`ContentSummaryAgent` (`agents/content_summary_agent.py`)

Summarises sampled comments for a single content item on the Sentiment Analysis page. Called per-content when the user clicks the AI analysis button. Results cached in st.session_state['content_summaries'].

`HelpScoutSummaryAgent` (`agents/helpscout_summary_agent.py`)

Produces a page-level executive report from the filtered HelpScout conversations by reading their pre-extracted SUMMARY fields through an LLM.

Stratified sample by sentiment_polarity — capped at max_conversations (default 300).
Builds aggregate context: sentiment breakdown, top topics, flag counts, average duration, then per-conversation summaries (capped at 250 chars each).
Prompt asks the LLM to surface patterns beyond the pre-tagged topics/sentiments.
Output structure:

{
    "executive_summary": "...",
    "top_themes": [{"theme": "...", "description": "...", "prevalence": "..."}],
    "top_complaints": ["..."],
    "unexpected_insights": ["..."],
    "notable_quotes": ["..."]
}

Uses LLMHelper.get_structured_completion() with up to 3 retries.

Adding or Changing Things

Add a new chart to the Comment Dashboard

Write the chart function in the appropriate visualizations/ file.
Call it from render_dashboard() in components/dashboard.py.

Add a new chart to the HelpScout Dashboard

Add the chart method to HelpScoutCharts in visualizations/helpscout_charts.py.
Call it from render_helpscout_dashboard() in components/helpscout_dashboard.py.

Add a new HelpScout filter

Add the widget to the filter panel in helpscout_analysis.py.
Include the new value in the fetch_key tuple.
Add the corresponding WHERE clause condition to _build_analysis_query() in helpscout_data_loader.py.

Python-side filters (those whose data is not in the Snowflake HelpScout table) are applied after fetching rather than in SQL. The member/non-member filter is the canonical example: is_member is derived from st.session_state['helpscout_df'] after the Snowflake fetch. Such filters should not be included in the fetch_key tuple.

Add a new HelpScout topic

Edit process_helpscout/config_files/topics.json (the taxonomy file).
helpscout_utils.load_topic_taxonomy() reloads it on each app start; no other changes needed.

Change the cache duration

@st.cache_data(ttl=86400) appears on load_dashboard_data, _fetch_sa_data, _fetch_rr_data, load_demographics_data, and their HelpScout equivalents. Change 86400 to the desired TTL. Users can always force a refresh with "Reload Data" in the sidebar.

Add a new page

Create components/new_page.py with a render_new_page(...) function.
Import and add a radio option in app.py.
Add data loading to the appropriate loader class.
If the page should be excluded from global comment filters, extend the _hs_page guard in app.py.

Change what the Sentiment Analysis page queries

Edit _build_sa_content_query() and/or _build_sa_comments_query() in data_loader.py.
Update _process_sa_content_stats() and/or _process_sa_comments() for new columns.

Running the App

# From the project root
streamlit run visualization/app.py

Required environment variables (in .env at project root):

SNOWFLAKE_USER
SNOWFLAKE_PASSWORD
SNOWFLAKE_ACCOUNT
SNOWFLAKE_ROLE
SNOWFLAKE_DATABASE
SNOWFLAKE_WAREHOUSE
SNOWFLAKE_SCHEMA
OPENAI_API_KEY
APP_TOKEN

Configuration Reference

config/viz_config.json controls:

Section	What it configures
`color_schemes.sentiment_polarity`	Hex colors for each sentiment level
`color_schemes.intent`	Hex colors per intent label
`color_schemes.emotion`	Hex colors per emotion label
`color_schemes.platform`	Hex colors per platform
`color_schemes.brand`	Hex colors per brand
`color_schemes_helpscout.topics`	Hex colors for HelpScout topic bars
`color_schemes_helpscout.status`	Hex colors for conversation status values
`color_schemes_helpscout.boolean_flags`	Hex colors for refund/cancellation/membership flags
`sentiment_order`	Display order for sentiment categories
`intent_order`	Display order for intent categories
`emotion_order`	Display order for emotion categories
`negative_sentiments`	Which sentiment values count as "negative"
`dashboard.default_date_range_days`	Default date filter window for comment pages
`helpscout.default_date_range_days`	Default date filter window for HelpScout Analysis
`helpscout.max_summary_conversations`	Cap on conversations sent to LLM summary agent
`helpscout.escalation_sentiments`	Sentiment values that count as escalation
`snowflake.dashboard_query`	Lightweight comment dashboard query
`snowflake.demographics_query`	Demographics join query (comment pages)
`helpscout.dashboard_query`	Lightweight HelpScout dashboard query
`helpscout.demographics_query`	Demographics join query (HelpScout, keyed on email)
`demographics.age_groups`	Age bucket definitions (label → [min, max])
`demographics.experience_groups`	Experience bucket definitions
`demographics.top_timezones_count`	How many timezones to show in the geographic chart

Musora Sentiment Analysis Dashboard

Table of Contents

Project Structure

How Data Flows

Data Loading Strategy

Comment data (data/data_loader.py — SentimentDataLoader)

load_dashboard_data()

load_sa_data(platform, brand, top_n, min_comments, sort_by, sentiments, intents, emotions, date_range)

load_reply_required_data(platforms, brands, date_range)

SQL column qualification note

HelpScout data (data/helpscout_data_loader.py — HelpScoutDataLoader)

load_dashboard_data()

load_analysis_data(date_start, date_end, topics, sentiments, statuses, sources, is_refund, is_cancellation, is_membership)

get_filter_options(df)

Pages

1. Sentiment Dashboard (components/dashboard.py)

2. Custom Sentiment Queries (components/sentiment_analysis.py)

3. Reply Required (components/reply_required.py)

4. HelpScout Dashboard (components/helpscout_dashboard.py)

5. HelpScout Analysis (components/helpscout_analysis.py)

Global Filters & Session State

Full session state key reference

Snowflake Queries

Comment tables

HelpScout table

Static queries (in viz_config.json)

Dynamic queries (built in helpscout_data_loader.py)

Authentication

PDF Reports

Comment Dashboard PDF (utils/pdf_exporter.py — DashboardPDFExporter)

HelpScout Dashboard PDF (utils/helpscout_pdf.py — HelpScoutDashboardPDF)

HelpScout Analysis PDF (utils/helpscout_pdf.py — HelpScoutAnalysisPDF)

AI Agents

ContentSummaryAgent (agents/content_summary_agent.py)

HelpScoutSummaryAgent (agents/helpscout_summary_agent.py)

Adding or Changing Things

Add a new chart to the Comment Dashboard

Add a new chart to the HelpScout Dashboard

Add a new HelpScout filter

Add a new HelpScout topic

Change the cache duration

Add a new page

Change what the Sentiment Analysis page queries

Running the App

Configuration Reference

Comment data (`data/data_loader.py` — `SentimentDataLoader`)

`load_dashboard_data()`

`load_sa_data(platform, brand, top_n, min_comments, sort_by, sentiments, intents, emotions, date_range)`

`load_reply_required_data(platforms, brands, date_range)`

HelpScout data (`data/helpscout_data_loader.py` — `HelpScoutDataLoader`)

`load_dashboard_data()`

`load_analysis_data(date_start, date_end, topics, sentiments, statuses, sources, is_refund, is_cancellation, is_membership)`

`get_filter_options(df)`

1. Sentiment Dashboard (`components/dashboard.py`)

2. Custom Sentiment Queries (`components/sentiment_analysis.py`)

3. Reply Required (`components/reply_required.py`)

4. HelpScout Dashboard (`components/helpscout_dashboard.py`)

5. HelpScout Analysis (`components/helpscout_analysis.py`)

Static queries (in `viz_config.json`)

Dynamic queries (built in `helpscout_data_loader.py`)

Comment Dashboard PDF (`utils/pdf_exporter.py` — `DashboardPDFExporter`)

HelpScout Dashboard PDF (`utils/helpscout_pdf.py` — `HelpScoutDashboardPDF`)

HelpScout Analysis PDF (`utils/helpscout_pdf.py` — `HelpScoutAnalysisPDF`)

`ContentSummaryAgent` (`agents/content_summary_agent.py`)

`HelpScoutSummaryAgent` (`agents/helpscout_summary_agent.py`)