GTPS-CV Methodology v1.0

Geo-Trust Progressive Sampling with Cross-Validation

Version: 1.0 Status: Production-Ready Framework Last Updated: January 2026

Executive Summary

GTPS-CV is a field-driven survey methodology designed to produce representative, manipulation-resistant data in environments where digital surveys are vulnerable to farming, outsourcing, or coordinated bias. Version 1.0 introduces structured sampling protocols, expanded demographic capture, situational location classification, and statistical weighting frameworks to achieve population representativeness.


Core Principles

  1. Trust earned through performance — not inherited or assumed
  2. Geographic diversity as a quality signal — preventing regional capture
  3. Cross-validation among coordinators — detecting anomalies through comparison
  4. Stratified sampling with post-stratification weighting — achieving representativeness
  5. Diversity over volume — rewarding coverage breadth, not response counts

Part 1: Coordinator Structure and Trust System

1.1 Trust Hierarchy

The platform operates through a hierarchical coordinator network:

TierTrust RangeAppointmentCapacity
Tier 1 (Anchor)90–100Site-appointed10–20 coordinators
Tier 270–89Recruited by Tier 1Up to 10 per parent
Tier 350–69Recruited by Tier 2Up to 10 per parent
Tier 4 (Field)30–49Recruited by Tier 3Up to 10 per parent

1.2 Trust Initialization

New coordinators are initialized at:

Initial Trust = min(Parent Trust × 0.85, Tier Maximum) - Admin Adjustment

Where:
- Parent Trust × 0.85 creates 15% minimum decay
- Tier Maximum caps trust at tier ceiling
- Admin Adjustment allows manual reduction (0–20 points)

1.3 Trust Dynamics

Trust scores update after each survey period based on four factors:

FactorWeightMeasurement
Cross-validation consistency35%Deviation from area peers
Geographic cluster diversity30%Unique location clusters covered
Demographic cluster diversity20%Spread across demographic cells
Protocol compliance15%Geo-fence adherence, timing rules

Trust Update Formula:

New Trust = Current Trust + (Performance Score - 50) × Learning Rate × Trust Modifier

Where:
- Performance Score = weighted sum of factors (0–100)
- Learning Rate = 0.1 (slow adjustment)
- Trust Modifier = 1.0 for Trust > 70, 1.2 for Trust ≤ 70 (faster recovery)

Trust is bounded: minimum 10, maximum 100. Coordinators falling below 20 are flagged for review.


Part 2: Sampling Framework

2.1 Location Selection Protocol

Representativeness begins with systematic location selection, not convenience.

Primary Sampling Units (PSUs)

Each survey defines PSUs based on:

  • Administrative boundaries (districts, municipalities)
  • Population density zones
  • Known demographic distributions

Location Assignment

Coordinators receive location assignments through:

  1. Randomized Grid Assignment: Geographic area divided into grid cells; coordinators assigned random cells
  2. Quota-Based Distribution: Cells weighted by population; more coordinators assigned to denser areas
  3. Rotation Schedule: Assignments rotate weekly to prevent familiarity bias

2.2 Respondent Selection Protocol

Within assigned locations, coordinators follow systematic selection:

Time-Interval Sampling

  • Begin collection at assigned start time
  • Approach first eligible adult after arrival
  • After each completion, wait 3 minutes before next approach
  • Continue until quota met or time window closes

Systematic Selection Rules

  • Approach adults (18+) who appear available
  • If refused, wait 1 minute, approach next eligible person
  • No targeting based on appearance, dress, or perceived demographics
  • Record all refusals for response rate calculation

2.3 Situational Location Classification

Every response captures situational context to enable stratification and bias detection.

Location Categories

Category CodeLocation TypeExpected Demographics
RES-APTApartment complexMixed urban
RES-HSEResidential house areaSuburban/family
EDU-SCHSchool vicinityParents, staff
EDU-COLCollege/UniversityYoung adults 18–25
TRN-STATrain/Metro stationCommuters, mixed
TRN-RDETrain/Metro rideCommuters, mixed
BUS-STPBus stopMixed, lower-middle income
BUS-RDEBus rideMixed, lower-middle income
COM-MLLShopping mallConsumers, mixed
COM-MKTStreet market/bazaarLocal community
COM-CAFCafe/RestaurantUrban, varied income
REL-MOSMosque vicinityMuslim community
REL-CHRChurch vicinityChristian community
REL-TMPTemple vicinityHindu/Buddhist community
REL-OTHOther religious siteVaries
REC-PRKPublic parkFamilies, recreation
REC-SPTSports facilityActive adults
WRK-OFFOffice districtWhite-collar workers
WRK-INDIndustrial areaBlue-collar workers
GOV-OFFGovernment officeCitizens, bureaucrats
HLT-HSPHospital vicinityPatients, caregivers
OTHOther (specify)Requires description

Urban/Rural Classification

CodeDefinitionCriteria
URB-1Metro urbanCity population > 1 million
URB-2UrbanCity population 100K–1M
URB-3Semi-urbanTown population 20K–100K
RUR-1Rural townPopulation 5K–20K
RUR-2Rural villagePopulation < 5K

Part 3: Data Collection Specification

3.1 Geographic Data (Geo-Cluster)

Every response captures a three-part geo-cluster:

ComponentFieldSourceRequired
PhysicalGPS CoordinatesBrowser GPSYes
SituationalLocation TypeCoordinator selectionYes
SettlementUrban/Rural CodeDerived + CoordinatorYes

GPS Requirements:

  • Accuracy threshold: ≤ 50 meters
  • Responses without GPS are flagged as "unverified"
  • Unverified responses weighted at 50% in analysis
  • More than 20% unverified from a coordinator triggers review
  • IP geolocation used only for cross-validation, never as primary

Geo-Cluster Example:

{
  "gps": { "lat": 23.8103, "lng": 90.4125, "accuracy": 12 },
  "situational": "TRN-STA",
  "settlement": "URB-1"
}

3.2 Demographic Data (Demo-Cluster)

Four demographic attributes form the demo-cluster:

Gender

CodeLabel
MMale
FFemale
XOther/Prefer not to say

Age Bracket

CodeRange
A118–24
A225–34
A335–44
A445–54
A555–64
A665+

Occupation Category

CodeCategoryExamples
OCC-STUStudentSchool, college, university
OCC-UNEUnemployed/SeekingJob seekers
OCC-HOMHomemakerPrimary household managers
OCC-RETRetiredPensioners
OCC-AGRAgriculture/FishingFarmers, fishers, laborers
OCC-MANManual/TradeConstruction, factory, drivers
OCC-SVCService sectorRetail, hospitality, security
OCC-CLRClerical/OfficeAdmin, data entry, reception
OCC-PROProfessionalEngineers, doctors, lawyers, teachers
OCC-MGTManagement/ExecutiveManagers, directors, business owners
OCC-GOVGovernment/PublicCivil servants, military, police
OCC-OTHOtherSpecify in notes

Income Level (Self-Reported)

CodeDescriptionAnchor Question
INC-1Struggling"Difficulty meeting basic needs"
INC-2Getting by"Cover basics with little extra"
INC-3Comfortable"Meet needs with some savings"
INC-4Well-off"Comfortable with regular savings"
INC-5Affluent"No financial concerns"
INC-XPrefer not to say

Part 4: Diversity Scoring and Rewards

4.1 Core Principle: Diversity Over Volume

Coordinators are never rewarded for volume. The incentive system rewards coverage diversity across geographic and demographic clusters. A coordinator with 20 responses across 15 unique clusters scores higher than one with 100 responses from 3 clusters.

4.2 Diversity Score Calculation

Each coordinator's performance is measured by two diversity indices:

Geographic Diversity Index (GDI)

Measures spread across unique geo-clusters:

GDI = (Unique Geo-Clusters Covered / Total Possible Geo-Clusters in Area) × 100

Where Unique Geo-Cluster = unique combination of:
- GPS grid cell (500m × 500m)
- Situational location type
- Urban/Rural code

Demographic Diversity Index (DDI)

Measures spread across demographic cells:

DDI = (Unique Demo-Clusters Covered / Target Demo-Clusters) × 100

Where Unique Demo-Cluster = unique combination of:
- Gender (3 options)
- Age bracket (6 options)
- Occupation (12 options)
- Income level (6 options)

Maximum theoretical cells = 3 × 6 × 12 × 6 = 1,296
Practical target cells (based on population distribution) ≈ 100–200

Combined Diversity Score (CDS)

CDS = (GDI × 0.5) + (DDI × 0.5)

4.3 Reward Point System

Points are earned based on diversity contribution, not response count:

ActionPointsCondition
New geo-cluster coverage10First response in that geo-cluster for this survey
New demo-cluster coverage10First response in that demo-cluster for this survey
Repeat geo-cluster1Additional response in already-covered geo-cluster
Repeat demo-cluster1Additional response in already-covered demo-cluster
Cross-validation bonus5Response aligns with nearby coordinator (±8% margin)
Hard-to-reach bonus15Response from designated underserved area/demographic

Point Decay for Repetition:

Points per repeat = 1 / (1 + repeat_count_in_cluster)

Response 1 in cluster: 10 points (new)
Response 2 in cluster: 1 / (1 + 1) = 0.5 points
Response 3 in cluster: 1 / (1 + 2) = 0.33 points
Response 10 in cluster: 1 / (1 + 9) = 0.1 points

4.4 Leaderboard Rankings

Leaderboards display coordinators ranked by:

  1. Primary Rank: Combined Diversity Score (CDS)
  2. Secondary Rank: Cross-validation consistency rate
  3. Tertiary Rank: Protocol compliance rate

Volume (total responses) is displayed but never used for ranking.

4.5 Reward Tiers (When Sponsorship Available)

TierCDS ThresholdReward Type
PlatinumCDS >= 80Monetary + Certificate + Badge
GoldCDS 60–79Monetary + Certificate
SilverCDS 40–59Certificate + Recognition
BronzeCDS 20–39Recognition
ParticipantCDS < 20Participation acknowledgment

4.6 Non-Monetary Recognition

RecognitionCriteria
Digital volunteer certificateComplete at least one survey with CDS >= 20
"Diversity Champion" badgeTop 10% CDS in any survey
"Coverage Pioneer" badgeFirst to cover a hard-to-reach cluster
"Trusted Collector" badgeTrust score >= 85 for 3+ consecutive surveys
Public leaderboard rankingAll active coordinators
Social media shoutoutWeekly top 5 by CDS

Part 5: Cross-Validation System

5.1 Validation Clusters

Coordinators operating in overlapping areas are grouped into validation clusters. A validation cluster requires:

  • Minimum 3 coordinators
  • Minimum 20 responses per coordinator
  • Same survey, same time window

5.2 Statistical Comparison

For each survey question, answer distributions are compared:

Deviation Score = |Coordinator Distribution - Cluster Mean Distribution|

Measured using Jensen-Shannon Divergence (JSD):
- JSD = 0: Identical distributions
- JSD = 1: Completely different distributions

Thresholds

JSD ScoreInterpretationAction
0.00–0.05Excellent alignmentTrust boost (+2)
0.05–0.10Acceptable varianceNo change
0.10–0.20Elevated varianceFlag for review
0.20+Significant deviationTrust penalty (-5), manual review

5.3 Handling Legitimate Outliers

Some coordinators may collect from genuinely different sub-populations. To prevent penalizing accurate outliers:

  1. Cluster Segmentation: If a coordinator's geo/demo profile differs significantly from peers, they form a separate validation sub-cluster
  2. Historical Comparison: Coordinator's results compared to their own historical patterns
  3. Manual Override: Flagged cases reviewed by Tier 1 coordinators before trust penalties apply

Part 6: Post-Stratification Weighting

6.1 Purpose

Raw survey data rarely matches population proportions. Post-stratification weighting adjusts for over/under-representation to produce population-representative estimates.

6.2 Weighting Cells

Responses are grouped into weighting cells based on:

DimensionCategoriesSource of Population Proportions
RegionAdministrative unitsCensus data
SettlementURB-1, URB-2, URB-3, RUR-1, RUR-2Census data
GenderM, F, XCensus data
AgeA1–A6Census data

6.3 Weight Calculation

Cell Weight = (Population Proportion in Cell / Sample Proportion in Cell)

Example:

  • Population: 25% are URB-1, Female, A2
  • Sample: 35% are URB-1, Female, A2
  • Weight = 0.25 / 0.35 = 0.71

6.4 Weight Trimming

Extreme weights distort variance. Weights are trimmed:

Trimmed Weight = max(0.2, min(5.0, Raw Weight))

Cells with weights outside 0.2–5.0 are flagged as poorly sampled.

6.5 Effective Sample Size

Weighting reduces statistical power. Report effective sample size:

n_eff = (Sum of weights)^2 / Sum(weights^2)

Results should report both raw n and n_eff.


Part 7: Anti-Manipulation Design

7.1 Structural Defenses

ThreatDefense Mechanism
GPS spoofingCross-reference with IP, device fingerprint, movement patterns
Fake coordinatorsTrust decay, referral accountability, minimum activity thresholds
Answer farmingCross-validation detects anomalous uniformity
Demographic targetingDiversity scoring discourages cluster concentration
Coordinator collusionRandom validation cluster assignment, rotation
Volume gamingZero reward for volume; diversity-only incentives

7.2 Detection Signals

The system monitors for:

  • Velocity anomalies: Too many responses too quickly
  • Pattern uniformity: Identical or near-identical answer sequences
  • Geo-impossibility: Responses from locations too far apart in time
  • Demographic skew: Extreme concentration in single demo-cluster
  • Cross-validation failure: Persistent deviation from area peers

7.3 Response Actions

Signal SeverityAutomatic ActionHuman Review
LowFlag for monitoringNo
MediumReduce trust scoreOptional
HighSuspend data acceptanceRequired
CriticalSuspend coordinatorRequired

Part 8: Reporting Standards

8.1 Required Disclosures

All published results must include:

  1. Methodology version used (e.g., GTPS-CV v1.0)
  2. Raw sample size and effective sample size
  3. Collection period and geographic scope
  4. Weighting variables and trimming applied
  5. Coverage gaps: Which geo/demo clusters are under-represented
  6. Coordinator network size and average trust score
  7. Cross-validation pass rate

8.2 Confidence Reporting

Results presented with:

  • Point estimate
  • 95% confidence interval (accounting for design effect)
  • Margin of error

8.3 Limitations Statement

Every report includes standardized limitations:

"GTPS-CV produces probability-approximating samples through systematic field protocols, but is not a true probability sample. Results should be interpreted as indicative of population sentiment with the disclosed margins of uncertainty. Post-stratification weights adjust for known demographic imbalances but cannot correct for unmeasured biases."


Part 9: System Parameters (Configurable)

ParameterDefaultRangeDescription
trust_decay_rate0.850.70–0.95Trust multiplier for new recruits
learning_rate0.100.05–0.20Speed of trust adjustment
cv_margin_acceptable0.100.05–0.15JSD threshold for acceptable variance
weight_trim_lower0.200.10–0.50Minimum weight
weight_trim_upper5.003.00–10.00Maximum weight
gps_accuracy_threshold50m20–100mMaximum acceptable GPS uncertainty
min_cluster_size_cv32–5Minimum coordinators for cross-validation
new_cluster_points105–20Points for new cluster coverage
repeat_cluster_base10.5–2Base points for repeat cluster
gdi_weight0.500.30–0.70Weight of GDI in CDS calculation
ddi_weight0.500.30–0.70Weight of DDI in CDS calculation

Part 10: Governance and Sponsor Neutrality

10.1 Sponsor Restrictions

Sponsors cannot influence:

  • Survey question wording or answer options
  • Trust algorithm parameters
  • Geographic targeting or exclusion
  • Weighting methodology
  • Data acceptance logic
  • Coordinator selection or rewards
  • Diversity scoring formulas

10.2 Sponsor Permissions

Sponsors may:

  • Fund the reward pool
  • Request specific geographic coverage (without exclusions)
  • Receive anonymized, weighted aggregate results
  • Display brand acknowledgment in survey interface

10.3 Editorial Independence

Survey design and analysis remain under platform editorial control. Sponsor-requested surveys undergo review to ensure:

  • Questions are non-leading
  • Answer options are balanced
  • Topic is appropriate for field collection

Appendix A: Glossary

TermDefinition
Geo-clusterUnique combination of GPS grid cell + situational location + settlement type
Demo-clusterUnique combination of gender + age + occupation + income
CDSCombined Diversity Score: weighted average of GDI and DDI
GDIGeographic Diversity Index
DDIDemographic Diversity Index
JSDJensen-Shannon Divergence: statistical measure of distribution difference
PSUPrimary Sampling Unit
Post-stratificationStatistical adjustment to align sample with population proportions
Trust decayReduction in initial trust score across coordinator tiers
Cross-validationComparison of results between coordinators in same area
n_effEffective sample size after weighting adjustment

Appendix B: Comparison with Original Methodology

AspectOriginalVersion 1.0
Location dataGPS onlyGPS + Situational + Settlement (3-part geo-cluster)
DemographicsGender, AgeGender, Age, Occupation, Income (4-part demo-cluster)
Sampling protocolUndefined ("random in public")Systematic time-interval with location assignment
Reward basisVolume-influenced pointsDiversity-only (CDS-based)
Cross-validationFixed ±5–10% thresholdStatistical (Jensen-Shannon Divergence)
WeightingGeographic normalization onlyFull post-stratification with trimming
Trust formulaDescribed but unspecifiedPublished formula with parameters
Reporting standardsNone specifiedRequired disclosures and limitations
Outlier handlingNot addressedCluster segmentation + historical comparison

Appendix C: Implementation Checklist

  • Configure PSUs for target geography
  • Load census benchmarks for weighting cells
  • Set system parameters (or accept defaults)
  • Onboard Tier 1 coordinators (10–20)
  • Train coordinators on time-interval selection protocol
  • Deploy mobile app with 3-part geo-cluster capture
  • Deploy demographic collection UI with 4-part demo-cluster
  • Establish validation cluster boundaries
  • Configure leaderboard (diversity-ranked, volume displayed only)
  • Implement point decay formula for repeat clusters
  • Draft limitations disclosure template
  • Schedule first survey pilot (target: 500–1,000 responses)
  • Post-pilot: Review cluster coverage gaps
  • Post-pilot: Calibrate weighting parameters

Appendix D: Data Schema Summary

Geo-Cluster Schema

geo_cluster:
  gps:
    lat: float (required)
    lng: float (required)
    accuracy: integer meters (required)
  situational: enum [RES-APT, RES-HSE, EDU-SCH, EDU-COL, TRN-STA,
                     TRN-RDE, BUS-STP, BUS-RDE, COM-MLL, COM-MKT,
                     COM-CAF, REL-MOS, REL-CHR, REL-TMP, REL-OTH,
                     REC-PRK, REC-SPT, WRK-OFF, WRK-IND, GOV-OFF,
                     HLT-HSP, OTH] (required)
  settlement: enum [URB-1, URB-2, URB-3, RUR-1, RUR-2] (required)

Demo-Cluster Schema

demo_cluster:
  gender: enum [M, F, X] (required)
  age: enum [A1, A2, A3, A4, A5, A6] (required)
  occupation: enum [OCC-STU, OCC-UNE, OCC-HOM, OCC-RET, OCC-AGR,
                    OCC-MAN, OCC-SVC, OCC-CLR, OCC-PRO, OCC-MGT,
                    OCC-GOV, OCC-OTH] (required)
  income: enum [INC-1, INC-2, INC-3, INC-4, INC-5, INC-X] (required)

Document Version: 1.0

Methodology Status: Production-Ready Framework

Next Review: After first 1,000-response pilot