Dark Mode

Indian Rainfall Data Analysis

Exploratory Data Analysis & Visualization Project

Dataset Information

Comprehensive rainfall data analysis for Indian meteorological subdivisions

Dataset Overview

  • Source: Indian Meteorological Department (IMD)
  • Time Period: 1951 - 2014 (64 years)
  • Geographic Coverage: 36 Meteorological Subdivisions
  • Data Points: 2,300 records
  • File Format: CSV (Comma Separated Values)
  • File Size: ~250 KB

Data Properties

  • Data Type: Time Series Data
  • Frequency: Annual measurements
  • Measurement Unit: Millimeters (mm)
  • Missing Values: Handled with median imputation
  • Outliers: 13 extreme events identified
  • Data Quality: Cleaned and validated

Data Dictionary

Column Name Description Data Type Range/Values
SD NO. Subdivision ID number Numeric 1-36
SD_Name Name of meteorological subdivision Text 36 Indian regions
YEAR Calendar year of measurement Numeric 1951-2014
JAN January rainfall Numeric 0-300.8 mm
FEB February rainfall Numeric 0-363.7 mm
MAR March rainfall Numeric 0-353.9 mm
APR April rainfall Numeric 0-551.5 mm
MAY May rainfall Numeric 0-973.1 mm
JUN June rainfall Numeric 0.4-1432.8 mm
JUL July rainfall Numeric 5.6-1884.9 mm
AUG August rainfall Numeric 4.0-1664.6 mm
SEP September rainfall Numeric 0.1-1034.8 mm
OCT October rainfall Numeric 0-669.4 mm
NOV November rainfall Numeric 0-583.0 mm
DEC December rainfall Numeric 0-500.7 mm
ANNUAL Total annual rainfall Numeric 86.5-5553.9 mm
JAN-FEB Winter season rainfall Numeric 0-545.7 mm
Mar-May Pre-monsoon rainfall Numeric 0-1172.3 mm
Jun-Sep Monsoon season rainfall Numeric 79.4-4537.0 mm
Oct-Dec Post-monsoon rainfall Numeric 0-1133.4 mm

Geographic Coverage

The dataset covers all 36 meteorological subdivisions of India, spanning diverse climatic zones:

  • Coastal: Karnataka, Kerala, Konkan, Goa, and Andaman & Nicobar
  • Himalayan: Arunachal Pradesh, Sikkim, Uttarakhand, and Himachal
  • North-East: Assam, Meghalaya, Nagaland, Manipur, Mizoram, Tripura
  • Plains: Vast Plains and Plateau regions across Central & South India
  • Arid: Arid and semi-arid regions like West Rajasthan and Kutch

Temporal Coverage

64 years of continuous rainfall measurements providing a rich historical context:

  • Start Period: 1951 (Marking the post-independence data era)
  • End Period: 2014 (Incorporating modern climate shifts & trends)
  • Frequency: Consistent annual records with zero gaps
  • Seasonal: Pre-monsoon, Monsoon, Post-monsoon, Winter
  • Monthly: Granular 12-month metrics for precise analysis

Data Quality

Comprehensive data preprocessing performed:

  • Missing Values: 18 missing values filled with median
  • Outliers: 13 extreme events identified
  • Validation: Range checks and consistency tests
  • Cleaning: Removed duplicates and errors
  • Standardization: Consistent units and format

Rainfall EDA Dashboard

Comprehensive Exploratory Data Analysis of Indian Rainfall Patterns (1951-2014)

2,300

Data Points

64

Years of Data

36

Regions

20

Variables

Dashboard Features

Annual Rainfall Trends
Seasonal Patterns
Monthly Distribution
Correlation Analysis
Rainfall Categorization
Regional Comparisons
Outlier Detection
ML Insights
Time Series Forecasting
Statistical Summary
Interactive Visualizations
Data Export Options

Technical Implementation

Data Science Stack

  • Data Manipulation: Pandas, NumPy
  • Statistical Analysis: SciPy, Scikit-learn
  • Visualization: Matplotlib, Seaborn, Plotly
  • Machine Learning: Linear Regression, K-means Clustering
  • Geospatial: Folium for mapping

Web Technologies

  • Frontend: HTML5, CSS3, JavaScript
  • Frameworks: Bootstrap 5, Chart.js, Plotly.js
  • Icons: Font Awesome
  • Design: Responsive, modern UI with animations
  • Compatibility: Cross-browser support

Installation & Setup

Prerequisites

pip install jupyter pandas numpy matplotlib seaborn scipy scikit-learn plotly folium

Quick Start

  1. Open Rainfall EDA Dashboard.html in browser
  2. Or run local server: python -m http.server 8000
  3. Navigate to http://localhost:8000

1,400 mm

Average Annual Rainfall

5,554 mm

Maximum Recorded

86.5 mm

Minimum Recorded

13

Outlier Events

Annual Average Rainfall Over the Years

Seasonal Rainfall Trends

Monthly Rainfall Distribution

Monthly Rainfall Correlation Matrix

Rainfall Intensity Categories

Top 10 Wettest Regions (Average Annual Rainfall)

Rainfall Trend Forecasting (2015-2025)

Predicted annual rainfall based on 64 years of historical data trends.

Detailed Statistical Summary Analysis

Metric Value Interpretation
Mean (Annual) 1,411.2 mm Overall average central tendency of rainfall
Median (Annual) 1,385.5 mm Middle value, less affected by extreme events
Standard Deviation 185.4 mm Measure of rainfall variability over time
Skewness 0.452 Slightly positive skew (occasional high-rainfall years)
Kurtosis -0.124 Platykurtic distribution (flatter than normal)
Confidence Interval (95%) ± 45.8 mm Range of probable future annual values

Key Insights & Findings

Trend Analysis

No significant difference in rainfall patterns before and after 1980 (p-value: 0.349). Linear regression shows a stable trend with minimal variation over the 64-year period.

Outlier Detection

13 extreme rainfall events identified, primarily in Arunachal Pradesh and Coastal Karnataka. The highest recorded rainfall was 5,554 mm in Coastal Karnataka (1961).

Seasonal Patterns

June-September period (Monsoon) contributes approximately 75% of annual rainfall, with July being the peak month averaging 343 mm.

Regional Variations

Coastal Karnataka, Arunachal Pradesh, and Kerala receive the highest rainfall, while Rajasthan and Gujarat regions experience the lowest precipitation.

Clustering Analysis

K-means clustering identified 4 distinct rainfall patterns across regions, grouping areas with similar precipitation characteristics for better regional planning.

Decadal Fluctuations

Periodic fluctuations observed every decade highlight long-term climatic cycles, indicating distinct wet and dry epochs influencing overall rainfall magnitude.

Analysis Methodology

Data Collection & Cleaning

Loaded rainfall dataset from 1951-2014, handled missing values using median imputation, and performed data validation to ensure quality.

Descriptive Statistics

Calculated summary statistics, identified data distributions, and performed initial exploratory analysis to understand data characteristics.

Statistical Analysis

Applied t-tests for comparing time periods, z-score analysis for outlier detection, and correlation analysis for variable relationships.

Machine Learning

Utilized linear regression for trend analysis and K-means clustering for regional grouping based on rainfall patterns.

Data Visualization

Created interactive charts and plots using multiple visualization libraries to effectively communicate insights and patterns.

Conclusion & Reporting

Summarized key findings and trends into a final report, providing actionable insights for agricultural planning and water resource management.

Rainfall Classification System

Very Low (<300mm) Low (300-800mm) Moderate (800-1200mm) High (1200-2000mm) Very High (>2000mm)