Step 1: Compile List of Supplier-Region Pairs

Purpose

Establish a foundational inventory of unique supplier-region pairs to define the assessment's scope, ensuring comprehensive global coverage while prioritizing regions with high emissions or data potential.

Detailed Procedures

Data Extraction: Import the provided spreadsheet of utilities and suppliers by country/state into a processing tool such as Excel or Python. Use scripting for efficiency if the dataset is large (e.g., employing Pandas library: import pandas as pd; df = pd.read_excel('spreadsheet.xlsx') to load the data).
Deduplication and Uniqueness: Apply filters or code to remove duplicates and ensure each pair is unique based on supplier name and region (e.g., "EDF - France"). In Python, use df = df.drop_duplicates(subset=['supplier', 'region']). Manually review a sample (e.g., 10%) for edge cases like variant spellings (e.g., "State Grid Corp" vs. "State Grid Corporation of China") and resolve using standardized naming conventions from authoritative sources like IEA utility lists.
Prioritization: Rank pairs by criteria such as emission intensity, SSS relevance, and data richness. Calculate priority scores using a weighted formula (e.g., Score = 0.5 * Emission_Rank + 0.3 * Regulatory_Status + 0.2 * Data_Availability, where ranks are derived from normalized values). Prioritize high-emission regions (e.g., top 50% focus on US, EU, China, India based on IEA global emissions data). Challenge prioritization assumptions by testing alternative weightings (e.g., sensitivity analysis with ±20% shifts to emphasize emerging markets like Southeast Asia) and disprove biases (e.g., overfocus on developed regions by cross-verifying against World Bank energy access reports for equity).
Metadata Addition: Enrich the list with relevant attributes, such as estimated generation capacity, ownership type (e.g., IOU vs. public), and preliminary SSS indicators (e.g., presence of RPS from DSIRE database). Use code execution for batch additions (e.g., merge with external datasets via Pandas joins). Triple-verify metadata accuracy by cross-checking against multiple sources (e.g., EIA Form 860 for US capacities, ENTSO-E for EU, NEA for China) and alternative methodologies (e.g., web searches for "[supplier] capacity [region] 2025" combined with mathematical extrapolation from historical trends using SymPy for curve fitting).
Verification: Perform multi-angle checks: (1) Internal consistency (e.g., code to flag anomalies like mismatched regions); (2) External validation (e.g., compare subset to independent lists from IRENA or RE100 members); (3) Logical scrutiny (e.g., challenge completeness by deliberately seeking underrepresented pairs via semantic searches on X for "emerging market utilities 2025" and disprove assumptions of exhaustive spreadsheet coverage through probabilistic sampling). Triple-verify by re-running deduplication with alternative algorithms (e.g., fuzzy matching via Levenshtein distance) and independently recalculating priorities using different benchmarks (e.g., EDGAR emissions database vs. IEA). Document potential pitfalls like naming inconsistencies or regional fragmentation (e.g., US states vs. EU countries) and mitigate with normalization rules.
Uncertainties and Mitigations: Address uncertainties such as incomplete spreadsheet data by supplementing with public databases (e.g., add missing pairs from EIA utility directories); handle improbable scenarios like supplier mergers (e.g., verify via recent news snippets) by flagging for Step 2 review. Reconsider the entire step from scratch: Re-import data, re-deduplicate, and re-prioritize independently to confirm no oversights, explicitly noting any logical gaps (e.g., underrepresentation of Africa—mitigated by adding 10% buffer pairs from World Bank sources).

PreviousGlobal Public Data Assessment NextStep 2: Assign SSS Resource Categories

Last updated 1 month ago

Was this helpful?