Los Angeles Parking Citation Analysis

Introduction

Parking enforcement plays a critical role in managing urban traffic flow and generating municipal revenue. The City of Los Angeles provides open access to detailed parking citation records through its Open Data Portal. The project uses the LA Parking Citation dataset from 2021 to 2025. The dataset contains millions of parking citation records and contains detailed information such as the date and time of issuance, type of violation, amount of fine, make of the vehicle, and geographical location of the violation (which is represented by latitude and longitude coordinates).

The research objective of this study is to explore structural patterns in parking violations over time. Specifically, this project aims to address the following research questions:

How has the number of parking violations changed in terms of distribution from 2021 to 2025?
Which types of violations contribute to the largest number of citations and fines?
Are certain vehicle makes or body styles more likely to be linked to certain types of violations?
Do parking violations tend to cluster in certain geographic regions of the City of Los Angeles?

The following hypotheses are proposed:

A small number of violation types is responsible for a large number of citations and fines.
Parking violation patterns vary over time, potentially reflecting changes in traffic volume and urban activity.
Vehicle characteristics (such as body type) are linked to variations in violation types.
Violations tend to be geographically concentrated in high-density commercial or central business districts of Los Angeles.

Through the exploration of these research questions, this research seeks to gain a deeper understanding of the structure of parking enforcement data and identify systematic patterns of parking violations in urban areas.

Methods

Data Acquisition

The data was collected from the Open Data Portal of the City of Los Angeles using the Socrata Open Data API (SODA). The citation records from 2021 to 2025 were fetched in Python using the requests library. The API calls used date-based filtering to systematically fetch all relevant records. The JSON results were then parsed into a tabular format using pandas and saved as structured CSV files for further analysis.

Data Cleaning and Preprocessing

After data acquisition, an initial variable selection step was performed to keep only relevant fields required for subsequent analysis. The dataset was filtered to retain the following variables: ticket identifier, date components (year, month, day), hour, violation code, violation description, fine amount, geographic coordinates (latitude and longitude), vehicle make, registered plate state, and body style. To maintain data quality, rows with missing values were deleted. This step ensured that all remaining rows had valid entries for all the selected variables. The percentage of records lost relative to the original dataset was calculated to assess the effect of missing-value deletion.

Plausibility Checks

Basic plausibility checks were performed on the key numerical variables.

Temporal validity

The temporal validity of the dataset was first checked. The year, month, and day variables were combined to form a standardized date format for each record. The minimum and maximum dates were inspected to ensure that all records fell within the scope of the study period. Records with invalid date entries were identified to ensure temporal validity. In addition, the hour variable was inspected to ensure that all records fell within the standard 24-hour range. Minimum and maximum hour values were examined, and entries outside the permissible interval were identified. . This validation step ensured that records with invalid time entries did not affect the analysis of daily and cross daily temporal patterns.

Geographic Filtering

The geographic coordinates were analyzed to ensure that extreme data points represented valid administrative penalties or not potentially indicative of data entry errors. A set of predefined latitude and longitude limits was defined based on the known geographic extent of the study area. Data points that fell outside these limits were considered potentially invalid. Geographic filtering was then performed to include only data points that fell within the defined geographic region.

Fine amount

The highest fine amounts were analyzed to ensure that extreme data points represented valid administrative penalties or not potentially indicative of data entry errors. The fine amount distribution was analyzed using graphical visualization methods, including histogram and box plot analysis. Statistical outliers were determined using the Interquartile Range (IQR) method. The first and third quartiles were calculated, and the IQR was defined as their difference. The lower and upper bounds were then defined as: Q1−1.5×IQR and Q3+1.5×IQR. Data points that fell outside this range were considered statistical outliers. However, extreme data points that were consistent with known administrative penalty structures were retained.

Frequency

Frequency distributions were created for important categorical variables. These variables included descriptions of violations, vehicle makes, and vehicle body styles. Absolute frequencies and relative proportions were calculated to determine the most frequent categories for each variable. These analyses offered a descriptive summary of the makeup of citation data before the modeling steps.

Time Patterns

Citation counts were aggregated at the year–month level and arranged to form a time series representation. The resulting monthly series was prepared for visualization to examine temporal trends in citation volume. Besides, citation counts were also aggregated by hour of the day to identify patterns in violation occurrences across different times. The histogram was generated to visualize the distribution of citations by hour. It provided insights into peak violation periods.

Preliminary Results

Dataset Overview and Variable Selection

The initial dataset retrieved from the City of Los Angeles Open Data Portal contains 9,234,940 citation records and 23 variables, with a total memory usage of about 1.6 GB. It has 6 float variables, 2 integer variables, and 15 string variables. As per the design of the methodology, irrelevant variables are eliminated. After the selection of variables, no records are deleted from the dataset. However, the number of variables is reduced to 13, and the total memory usage is reduced to about 819 MB. Compared to the original dataset, this step of selecting variables has led to a reduction of nearly 50% in the memory usage.

Data Cleaning Results

To ensure data integrity across all selected variables, observations containing missing values were removed. The original dataset had 9,234,940 records. After deleting the records with missing values, 8,605,218 complete observations are left. This corresponds to a retention rate of 93.18% of the original dataset. Although approximately 6.82% of records were removed, the remaining dataset still contains over 8.6 million observations. Due to the large number of records and the small amount of missing data, listwise deletion of data seemed to be a good option. The high retention rate indicates that deleting the incomplete records would not affect the representativeness of the dataset.

Table 1. Summary statistics of key numerical variables after listwise deletion.

Statistic	fine_amount	loc_lat	loc_long
Count	8,605,218	8,605,218	8,605,218
Mean	74.04	34.06	-118.33
Std	42.64	0.25	0.89
Min	0.00	-42.88	-149.90
25%	63.00	34.03	-118.41
50% (Median)	73.00	34.06	-118.32
75%	73.00	34.10	-118.27
Max	1100.00	61.22	174.70

The average fine amount is $74.04, with values ranging from $0 to $1,100. The mean latitude and longitude are 34.06 and −118.33, respectively. This shows that most citations are geographically concentrated within the Los Angeles region.

Plausibility Checks

Temporal Validity

The range of observed dates is from January 1, 2021, to December 31, 2025, without any invalid date entries. This ensures that all records are within the target period of the study. The hour variable was also checked for adherence to the 24-hour clock standard. The observed hour values range from 0 to 23, and no record was found to be outside the valid range of [0, 23]. These results confirm the validity of the temporal variables.

Geographic Filtering

The geographic coordinates of the observations were also checked for their plausibility in terms of being within the Los Angeles region. Using pre-defined spatial constraints (latitude between 33.5 and 34.5, longitude between −119.0 and −117.5), a total of 1,554 records were found to be geographically invalid and hence filtered out.

Fine Amount Distribution and Extreme Values

Fine amount distributions were also checked for any data entry errors and distributional patterns. The observed fine amount values range from $0 to $1,100. The highest fine amount of $1,100 was observed to occur more than once and was in line with official administrative penalty rates. This confirmed that these values are for genuine offenses rather than data entry errors. The mean fine amount is $74.04, and the median is $73.00. This revealed a slightly right-skewed distribution. Most of the fine amounts were between $63 and $73, which reflected standardized penalty amounts for standard offenses. Using the IQR method, the statistical lower and upper limits were approximated to be around $48 and $88, respectively. Amounts outside this range are termed statistical outliers. The outliers were retained since they represented valid administrative penalty amounts. Figures 1–3 above offer graphical evidence of the distributional pattern and the existence of extreme but valid values.

Fine amounts

Figure 1. Distribution of fine amounts for all observations. The histogram displays the full range of fine amounts. The concentration of values below $200 and a long right tail extending toward $1,100.

Figure 2. Fine amount distribution (zoomed to values ≤ $200). The zoomed histogram reveals the dominant concentration of fines between $60 and $80.

Figure 3. Boxplot of fine amounts (≤ $200) with IQR bounds. The boxplot illustrates the interquartile range and statistical bounds ($48 and $88). Observations beyond these thresholds are identified as statistical outliers but remain consistent with valid administrative penalty categories.

Frequency

Violation Description Distribution

For violation descriptions, citation activity is highly concentrated in a small number of recurring enforcement categories. The most frequent violation, “NO PARK/STREET CLEAN,” accounts for 28% of all citations, followed by “METER EXP.” (15%) and “RED ZONE” (12%). Together, the top three categories represent more than half of all recorded violations, indicating concentrated enforcement patterns.

Figure 4. Top 10 violation categories ranked by proportion of total citations. “NO PARK/STREET CLEAN” accounts for 28% of citations, followed by “METER EXP.” (15%) and “RED ZONE” (12%). The results indicate that enforcement activity is highly concentrated in a small number of recurring violation types.

Vehicle Make and Body Style Distribution

Frequency distributions were further examined for vehicle makes and body styles to characterize the structural composition of cited vehicles. Among vehicle makes, Toyota accounts for the largest share of citations, followed by Honda and Ford. In terms of body style, passenger cars dominate the dataset, accounting for the overwhelming majority of citations. Pickup trucks and vans follow at significantly lower proportions. Figure 5 presents the distribution of the top vehicle makes and Figure 6 presents the body styles by percentage of total citations.

Figure 5. Distribution of cited vehicles by make. Toyota, Honda, and Ford represent the most frequently cited vehicle makes, reflecting the prevalence of these brands in urban traffic. Figure 6. Distribution of cited vehicles by body style. Passenger cars account for the vast majority of citations, followed by pickup trucks and vans. It shows that enforcement activity is concentrated on personal use vehicles.

Time Patterns

Monthly Citation Pattern

Monthly citation counts were aggregated to examine temporal trends over the study period. Overall, citation activity fluctuates between approximately 120,000 and 180,000 citations per month from the beginning of 2021 to the end of 2025. In 2021, citation volumes remain relatively stable with moderate variation. A noticeable increase occurs in early 2022, reaching a peak of approximately 180,000 citations per month. Following this peak, citation counts decline through late 2022 and reach a local minimum in early 2023. The period from 2023 onward continues to fluctuate substantially, with several short term increases and declines observed throughout 2024 and 2025. While overall volumes generally remain within the broad range of approximately 130,000 to 165,000 citations per month, the repeated peaks and valleys suggest recurring cyclical. Overall, the time series demonstrates alternating periods of expansion and contraction in citation volume.

Figure 7. Monthly parking citation counts (2021–2025). The series exhibits substantial month to month variation, with a peak in early 2022 and recurring rises and declines in subsequent years.

Citations by Hour

Citation activity remains relatively low during late night and early morning hours (00:00–05:00). A pronounced increase begins around 7:00 AM, with citations peaking during typical daytime enforcement hours. The highest volumes are observed between approximately 8:00 AM and 12:00 PM, with a secondary elevated period extending into early afternoon.

After mid afternoon, citation counts gradually decline, with noticeably lower activity during evening hours (after 7:00 PM). The pattern suggests that enforcement activity is concentrated during standard working hours rather than being evenly distributed throughout the day.

Figure 8. Distribution of parking citations by hour of day. Citation activity is highly concentrated during daytime hours, with peak volumes between 8:00 AM and 12:00 PM. Enforcement intensity declines substantially during late night and early morning periods.

Summary

This paper examines structural patterns in parking citation data in the city of Los Angeles from 2021 to 2025. With more than 8.6 million clean citation records, several structural patterns have been discovered in parking citation data.

First, the number of citations follows a temporal pattern. The number of citations per month varies from 120,000 to 180,000. There is a prominent peak in early 2022, followed by a drop to early 2023 and then periodic patterns. These findings confirm the hypothesis that parking citation activity follows a temporal pattern rather than a constant rate. Moreover, hourly aggregation shows a prominent diurnal pattern, where parking citations are predominantly made during working hours, specifically between 8:00 AM and 12:00 PM.

Second, there is a prominent concentration of violation types. A few violation types contribute to a disproportionately large number of total citations. “NO PARK/STREET CLEAN,” “METER EXP.,” and “RED ZONE” alone contribute to more than half of the total citations. This finding strongly supports the hypothesis that a few violation types dominate overall parking citation activity.

Third, vehicle characteristics follow a structural pattern. A few vehicle types, specifically passenger cars, dominate the total number of cited vehicles. Moreover, a few makes, specifically Toyota and Honda, dominate the total number of cited vehicles. This finding strongly supports the hypothesis that parking citation activity is primarily made on a few common personal vehicles rather than specialized vehicles.

In conclusion, the findings of this paper strongly support that parking citation activity in Los Angeles is not randomly distributed in time, violation type, or vehicle characteristics. Rather, parking citation activity strongly follows a structural pattern of concentration, periodicity, and dominance.

Plan for Final Project

Machine Learning Modeling

The primary objective is to model fine amounts and citation patterns.

Feature engineering will include temporal variables (month, hour), encoded violation types and vehicle characteristics, and basic spatial features derived from latitude and longitude.

Tree-based regression models will be applied to model fine amounts and rank feature importance. Model evaluation will be done using cross-validation and metrics such as RMSE.

In addition, a time series aspect will be considered to model monthly citation data and explore seasonal behavior. A Generalized Additive Model (GAM) can also be applied to model nonlinear relationships between time and location in an interpretable way.

Website Development

A web application will be built to display key results and modeling outputs. The application will feature summary statistics, time series plots, and modeling prediction results.

Visualization

The project will include clear visual representations of temporal trends, geographic patterns, and model feature importance. Selected visual elements (e.g., year or category filtering) will allow basic user exploration, while maintaining clarity and interpretability.