Relevance Drives Rigor: Exploring Open Data with R
Ryan Koch (00:02.178)
Christian, thank you so much for joining us here on Civic Tech Chat. Could you introduce yourself and tell us a little bit about what you do?
Christian Martinez (00:09.954)
Yes, absolutely. Thanks so much for having me. My name is Christian Martinez. I am born and raised in New York. I have a master's degree in cognitive neuroscience from the CUNY Graduate Center. I currently operate my own analytics company called Angles Analytics. And a lot of the fun work that I'm doing recently is through Brooklyn College, another CUNY school, where I'm an adjunct.
Ryan Koch (00:35.891)
And what would you say is your personal why? A thing that drives you to get out of bed each morning and do all those things.
Christian Martinez (00:43.01)
Great question. A lot of it is, I sometimes describe myself as a creative opportunist, where I see these opportunities to help whether it's individual people or communities, and I try to see if I can solve problems. Through those problems being solved, hopefully I can bring people together and build some sense of community.
Ryan Koch (01:06.325)
Your background includes teaching and involvement in NYC's open data community. How did those two things come together for you?
Christian Martinez (01:15.15)
Yeah. So it was, it's an interesting story. So I was getting my masters. This was probably about 2019, 2020, et cetera, something like that. And I got an email through the education sphere about New York city open data week. So the New York city open data platform in conjunction with beta New York city and the mayor's office of technology and innovation each year.
host their own conference all about New York City Open Data. And I was like, wow, this sounds awesome. So cool that they're just a bunch of different projects, talks, et cetera, using open data relating to New York City. And I was like, I think I want to do this one day. So I had graduated and I was still very interested in doing something for New York City Open Data Week. So I gathered my friends who were Knicks fans, the basketball team.
I said, Hey, let's do an analysis on the New York Knicks and let's do it with the hope of presenting it at New York City Open Data Week. We do a lot of elbow grease, the project itself gets picked and we present at New York City Open Data. And I then find out that they have this program called New York City Open Data Ambassadors, where volunteers like myself would host maybe a few different
introductory sessions per quarter for the general public on how to use New York City Open Data Platform. So I went through maybe a four week course, just meeting with different people from Beta New York City in the mayor's office on the platform, how to teach it. And that's how I became an ambassador. And then I very quickly found out that my students needed a little more oomph if I really wanted them to learn.
how to use R in statistics and analytics. And I found that relevance drives rigor. So that's how I decided to merge the world of New York City Open Data and my classroom.
Ryan Koch (03:22.555)
I like that phrase, relevance drives rigor. That's quite good.
Christian Martinez (03:25.452)
Yeah. Thank you. It's the truth, as I've found.
Ryan Koch (03:31.732)
And we use that phrase, open data, it came up a lot in your answer to that last question. There's probably folks coming into this a bit fresh, maybe that's a concept that hasn't been in their daily life and whatnot. When we use that term, what are we referring to and why is open data so important?
Christian Martinez (03:48.918)
Wow, so that's an important question. Not only what is open data, but why is it important? Do I have time for a quick little history lesson?
Ryan Koch (03:58.254)
go right ahead. I love a history lesson.
Christian Martinez (03:59.542)
Yes. Thank you. So part of the talk that I give when I do the introductory classes for people. So we go to New York back in like the late to mid 1800s. And there's this guy, Boss Tweed, who's mayor of New York City. And he's extremely corrupt. He runs a Democratic party at the time, a lot of nepotism, and he's really just hoarding a lot of the city's resources. New York gets fed up and they're like, hey, we don't want this anymore. So not only does
boss tweet get out of office, but they create what's now and was then called the city record. So anything that was happening in the city, like a new building was going up or new jobs, et cetera, were all being posted in the city record. So it's the first time that people were actually aware of what was going on in the city. We then get like a hundred years later into the seventies, 1960s, 1970s.
and we get to the FOIL laws, Freedom of Information Legislation, if I have my acronym right. So what the federal government and what a lot of state governments, including New York City, did was say, you can now request government data. So it's the first time it's really, I mean, the city record too, but it's the first time where data is open, where you're like, hey, I've got a question. Let me get the data from the government.
And while this is great, the problem is that you have to know what you're looking for because if you don't know what you're looking for, how are you going to request it? So then we get to, I think, 2012 where we get the freedom of information law. I could get that a little wrong, but New York City says, hey, in perpetuity, we are going to make all data around New York.
New York State and New York City open, which means that you no longer have to request it. Now we have it available so you can pull it at any time, anywhere. I'd say that's the best way to describe open data, data that you don't need any permission to use. You don't need a specific access or request. As long as you have access to the internet, you can gather it. And if we go into New York City Open Data Portal today, there are 3000 different data sets.
Christian Martinez (06:24.309)
I don't know how many billions of rows of data and I think there's 5,000 unique columns, so trillions on Pond, trillions of different data points that you can use at any time, anywhere.
Ryan Koch (06:35.672)
When you describe it like that, it sounds maybe like it's in a healthy place right now. Would you say that that's kind of the state of things as we think about it today?
Christian Martinez (06:42.815)
Absolutely. Yeah. think with New York City Open Data, mean, this may sound trivial, but the fact that you can access New York City Open Data from anywhere in the world really shows how powerful it is. You don't have to just be a New York City resident or be inside New York City. You can do it from South Korea or from any of the other 50 states. And no, not everything is on there. A lot of the times that there's sensitive information that
can't be, which is fair. But also the fact that, let's say you're looking for a data set and you can't find it. You can actually submit a request to New York City Open Data, say, hey, I would like to use this data. Is it available? And if they say yes, then you're the trendsetter and you're the reason why this data set could be made available for other people to use. So because of that, I'd say it's in an incredibly healthy space right now.
Ryan Koch (07:43.815)
Part of your work has involved using a programming language called R to put packages together. For folks that are unfamiliar with the language, how would you describe R and the things that it's good at and useful for?
Christian Martinez (07:58.446)
So for me, I always describe it as love at first sight. I was first introduced, a lot of people don't feel that way, so, including a lot of my students. But for me, it was love at first sight. So I first learned about art in 2019. I was taking a statistics course in my master's program. I had the option of taking neuroanatomy or statistics my first semester, and I thought that statistics would be more relevant early on.
Ryan Koch (08:03.165)
Yeah
Christian Martinez (08:28.693)
And then it was from the ground running. R is an open source tool that is fantastic at statistical analysis and graphs as well. Graphing, not only statistics, but numbers, anything you really want. Now, can it be as versatile as something like Python? Yes, you can do web scraping, et cetera, but it's really meant for statistical analyses, working with numbers, and working with data.
very influential in the academic community. And what makes it so powerful, like a lot of open source platforms, is that people can contribute. So if you have an idea, let's say you're trying to do something in R and it's not possible yet, you could be the one that builds it and influence the whole R community. And the best part about it is that it's free. And so in my family, we have the saying, if it's for free, it's for me.
Ryan Koch (09:25.693)
That's a good motto, especially in this economy.
Christian Martinez (09:29.389)
Yeah, and if you don't have like, there are a lot of programs, let's take Microsoft Excel, for example, where you have to pay for it. And so you can replace Microsoft Excel by replacing it with R. So you don't have to pay for it and it has way more capabilities than Excel does.
Ryan Koch (09:52.881)
What was the moment that made you combine that enjoyment and, well, love at first sight that you found with R and the need for folks to access open data portals and decide, hey, you know what, like, I need to do that open source thing and build a package for this.
Christian Martinez (10:09.037)
Yeah, first of all, I never started my R journey and was like, I am going to be a package developer. I actually was like, wow, this is something that's going to be way too advanced for me no matter what level I am. And it wasn't something that I was aspiring for. What really happened was I saw a opportunity to fix a problem and I jumped on it.
I started teaching at the Master's program, the Psychological Resource Master's program in the Psych Department in Brooklyn College, and they needed someone to teach students R. Here I go. I suggested that relevance drives rigor, and I was trying to teach students R with data sets that were not relevant whatsoever.
I don't know if anyone that's listening has used R, but they come with preset data sets. For instance, the empty cars data set, which is very popular and it's all about cars, but none of my students like cars. So when we're talking about cylinders, number of cylinders or how fast something can go or anything cars related, I mean, I'm putting them to sleep.
I say, how can I make my class more engaging because that's the only way that these students are really going to learn? And my number one priority is for them to learn. So I said, what if I change the data sets? What if I start incorporating New York City Open Data into my teaching instead of using these arbitrary base R or other arbitrary data sets? Great idea. However, to connect to R,
At the time, there was a few different ways. One, we'd have to download the data from New York City Open Data Portal, upload it to R, and that can be cumbersome, especially if you want to use daily data. For instance, the 311 data set is updated daily. So if we want to use the most up-to-date data each week, we'd have to re-download and upload, et cetera. And now there's folders and naming, et cetera.
Christian Martinez (12:29.901)
So I wanted to get rid of that. The second way is we can connect with APIs, but my students were brand new at R and I needed to make sure that I didn't lose them. So I didn't want to take some time to just talk about APIs, talk about that, computers, how computers talk to each other. I would be getting away from what I really wanted. So part of teaching R is teaching about packages and how to install them, their power. I said,
Screw it, let's just make my own package. Let's see if I can make the New York City Open Data package. I can help myself and these nine students. And that's how it became.
Ryan Koch (13:12.218)
I feel like we have a joke in there somewhere about it being that the cars data set is unpopular at a New York school. I feel like there's something in there about no one New York has a car or something.
Christian Martinez (13:19.693)
Yeah, that's a good feeling we could work on that.
Ryan Koch (13:27.163)
So you mentioned like not wanting to teach kind of that API interaction stuff. Like how toilsome was that for folks like before packages like that existed? Like what kind of a process is it to then try to kind of like do that yourself?
Christian Martinez (13:43.468)
Well, first, let me say that the New York City Open Data Portal makes it extremely easy to access their API. Each different data set comes with an API endpoint that you can copy and paste. So that is great by them. I'd say in R there is the R Socrata package that you can use. You have to understand a little bit.
First, if I want to use all these different data sets, I have to get all their different JSON links, So endpoints, excuse me. So I got to make sure I keep track of that. And if I want to keep track of that and use it, then I have to use maybe the HTTR package or a few other packages. So it can be a little cumbersome. And if someone's trying to learn R in the beginning, can get visually, it could look like, whoa.
This is a little out of my
Ryan Koch (14:46.269)
I see. It sounds like maybe getting into like intermediate level kind of stuff like dependency management and then you kind of have to understand a little bit about HTTP requests kind of stuff probably at that point and you probably have to manage like API key kind of things or auth tokens or yeah, like that's probably stuff that you don't want to do when you're just getting past hello world with the language.
Christian Martinez (14:49.474)
Yeah.
Christian Martinez (14:57.867)
Yeah, yeah, yeah.
Christian Martinez (15:04.661)
Yeah, I really don't want my students to be even more overwhelmed and that's something that I thought could push them over the edge.
Ryan Koch (15:17.498)
So recall from our prep conversation that you got to work on the package at the city level, but as you were kind of working along, you discovered that there was a state portal with similar capabilities, which then led you to do more building. I think you ended up with like three different packages of memory serves. What was that moment of discovery like? Maybe we can live a little bit vicariously through you.
Christian Martinez (15:33.749)
Yeah, so far.
Christian Martinez (15:40.451)
Yeah. So imagine that I've been working with the New York City Open Data Portal for a few years now, very comfortable with it and totally. I was in awe when I first started with it. And now I'm so happy that it's been there and I'm pretty advanced in it. So it was maybe the last day or the day before the last of the New York City Open Data Week 2026, the big conference that they have every week.
And there was a talk from New York City Council data team members. I was like, wow, that seems so fun. I'd love to see how the city council is using data to influence policy.
So I go and all of a sudden they pull up a different portal. It looks almost the same, but the coloring is a little different. And it's the New York State Open Data Portal. And I am totally in shock. I couldn't believe it. I was living under a rock thinking that there was only the New York City portal and not like a New York State or anything else. And as soon as I saw that, I was like, I have to do this too. It really...
Ryan Koch (16:44.023)
haha
Christian Martinez (16:51.585)
Being from New York and working in New York with CUNY, I was like, I have to make sure that as much data as possible can be made available for my students or anyone else because New York City is part of New York State and they talk and they're somewhat one entity. So that's when I started to build the New York State open data package. Now, in somewhat of a confusing manner,
New York City has what's called the MTA, the Mass Transit Association. So buses, subways, etc.
The MTA is not technically a New York City agency. It works for New York State, even though it just lives in New York City. And it kind of is on the New York State open data portal, like on a kind of its own section. So I said, okay, what if I make the MTA open data package this way? It's kind of like the intermediate or bridge between New York City and New York State, which it kind of acts as literally.
So now I have a small little New York open data ecosystem.
Ryan Koch (18:07.315)
Were there any lessons that you took as you were taking your first like, I'm gonna start the question over. Were there any lessons you took from building of the first package that then kind of helped you out as you say built the second and the third?
Christian Martinez (18:14.359)
Please.
Christian Martinez (18:24.213)
Yeah, so that's a fantastic question. When I first created the New York City Open Data package, I submitted it to R Open Sci, which is a community with the drive to make R as fantastic as possible. And it's really like a stamp of approval. So in the review process,
When you submit a package to R Open Science, there's a review of the package. And so you have an editor review it and then three independent reviewers, all with the intent of, we're part of the R community. We want to make this as strong as possible. And sometimes people are too close to the sun and they don't even know that, hey, you can do something a little better. And that'd be either making your package more streamlined or easier to maintain or maybe impacting more people.
In my first iteration of the New York City open data package, I had almost 40 different functions. Here was my thought. There are, when you go to the New York City open data portal, they have a list of the most viewed. So I said, okay, let me just take the top, whatever there were and make a function that pulls just that data set.
Christian Martinez (19:56.118)
This way, if you were someone that didn't know what you were looking for and wanted to explore using R, you could see 40 different functions and be like, like, I know what 311 is. I know what motor vehicle crashes is because it literally says that in the function name.
From a reviewer standpoint and from a maintenance standpoint, terrible though. The fact that if you have to change something about your code, you have to change it in 40 different places means that you're going to mess up. And sometimes I'm not the most keen on attention to detail. So that could lead to one spelling error or one line of code being deleted. And now the whole package is in disarray.
Ryan Koch (20:28.343)
Mmm.
Christian Martinez (20:42.997)
So thankfully now this iteration is we were able to take the metadata data set on New York City Open Data that hosts all of the API endpoints metadata regarding it. There's over 3000 different data sets, if I believe, and all the information about that and use that to pull any data from New York City Open Data. So now there's
three different functions instead of only 40. The first one is, hey, let's get a list of any data set that's available to you. The second is, hey, using that list, put in the name that you want and we can pull the data set. And the third is just in case you found a data set that is not on that meta list, you can put in the API endpoint and it'll pull right in from you. So...
Thankfully, this was all done before I did the New York State and MTA open data packages because I would have wrote, I don't know, anywhere from 100 to 200 maybe, depending on the popularity of some. And that would have been from a maintenance standpoint, a nightmare and really not the best way to optimize the packages.
Ryan Koch (22:00.285)
that's such a good lesson. Actually, I folks out here will probably do well to learn from that story. It's very much like, where do you put your layer of abstraction in the technical design of something kind of a thing? it's like you started with it, like, my abstraction is about the data sets, right? And now you've shifted to it's about like the actions you take, which is like a very interesting kind of refactor to be doing.
Christian Martinez (22:20.695)
Yeah.
Christian Martinez (22:24.683)
And a lot of the, I didn't mention this, but it's pertinent to note the package has really meant for myself and my students, but mostly my students. didn't think that anybody else would want to use this. it's crazy that other people have, but my thought was, okay, like my students have an interest in these data sets. Let's make it as easy as possible for them. So the first functions that I wrote in the package were just pulling their data sets that they had a desire for.
And then it started with, all right, let me use the most popular ones. And then it gets to 40. So is it less direct? Maybe, but it's definitely more streamlined and there's way more opportunity now.
Ryan Koch (23:08.126)
And I think something that's important for folks to pull out of the story also is that you probably wouldn't have gotten to the place where you're thinking about like how could it be more streamlined if you didn't at least get to the point where you could pull some data sets. So like if you're just like starting a project and you're just, just am gonna make this function for this data set just so you have something that works. That's like fine to start. You can always like make something better, right?
Christian Martinez (23:20.898)
Yeah.
Christian Martinez (23:32.31)
Yeah, I am a big believer in just start, like just get something down. I even remember when I was younger in school and the hardest part for me to write an essay was just starting. But once I started, I was able to fly by. And to that exact point, my first data set was just going to be the 311. I was going to see if I could just have a package with the 311 data set. And I submitted it to Crayon and they said, hey, is this package done? There's only one function.
Like you could have one function, but we typically like if there's finished products. And I was like, all right, let me do more then. And so it went from one to 40 to now 3000.
Ryan Koch (24:18.548)
I'd like to connect this a bit with some of the education stories you've been talking about. You mentioned that MT cars thing where you discover like, hey, I have this data set, people aren't super into it. So then you want to adventure to kind of get more relevant data sets. That kind of like quest for relevance ended up putting your students in a place where there's a, I believe you told me there's electronic book up with nine chapters each representing some sort of like
Christian Martinez (24:23.661)
Please.
Christian Martinez (24:43.575)
Yeah.
Ryan Koch (24:46.544)
interesting exploration of a question that they did. And as I of click through it, there's like a ton of breadth and variety, the types of questions that folks were interested in, and like nerding out about or exploring. What are these projects like? And what does their diversity say about the benefit of letting folks kind of follow their curiosity with a sort of learning?
Christian Martinez (25:08.609)
Fantastic question. So if I may jump back a little bit, I've been teaching for a good amount of time. And what I have found is that when you provide students an opportunity to be creative, they freeze. And it's unfortunate because they've got so many good ideas and there's so much opportunity and potential. And I really want them to explore it. Hey, you got a question? Go answer it. Let's have fun.
It's okay, let's make mistakes. If the question doesn't work out, no problem, we'll get another question. It's not that serious. And there's so much merit in exploration. And I think that's really what the book represents.
The final project was a research project and the goal was to potentially present it at New York City Open Data Week. I didn't know if they would be able to and or if it'd be good enough, if it fit the theme, but I wanted to for them to create something that could be presented. So I gave them two rules. One, it had to be about New York City and two, it had to use open data. Any question that you had relating to that?
Doesn't matter, just has to fit those two criteria. And all nine of my students kind of freaked out a little bit. Professor Martinez, what should I do? What should my topic be? I don't know. What do you like to do? well, I like to eat at restaurants. Okay, so let's answer a question about restaurants. What else do you like to do? Well, I like to go to museums. Okay, can you create a relationship or a question that pairs the two?
Well, let me see about that. there is so much opportunity in...
Christian Martinez (27:03.339)
When you look at my students' projects, it maybe is the first time in their academic career where they get to look at something fun and interesting to them. And it allows them to hone the skill instead of just learning a tool. Because my whole class was learning R, and they were going to have to learn R no matter what, if they had this research project or not. But now they got something that was tangible and excitable and
Relevance drives rigor. So now it's relevant to them. And so one of my students wanted to see if there was an impact on basketball players performance and team performance, if they played at Madison Square Garden or not. Another one wanted to see if there was a relationship between mold complaints and domestic violence in New York City. Another one wanted to see if there were better restaurants surrounding museums in New York City. And so all beautiful and eclectic topics.
And it's amazing because they can be proud of it. They have told me that they went to their parents, hey, look what I did. This is so cool how I would describe it as refrigerator worthy instead of just another assignment.
Ryan Koch (28:17.755)
that's so cool. something that occurs to me with that is it's not really just, I think as you were saying, it's not even really just about learning R at that point. Like by pushing, like to get these questions about things you're interested in, they're kind of like learning how to basically design natural experiments. They're engaging with the scientific method and those skills, whether you're using R or some other tool, cause like a job makes you one day, those skills are still relevant in any of those contexts.
Christian Martinez (28:34.817)
Yes.
Christian Martinez (28:48.761)
100%. And number one, you can have fun with things, which I think is an underutilized motive. And number two, you're exactly right. It's, let's build on the skills that we have learned and let's build something creative and something that's relevant to you. Because if you're more interested in it, you're more likely to do a better product and have more fun with it.
and maybe not procrastinate as much, and you get to learn how you actually work best.
Ryan Koch (29:25.747)
As I looked through the different chapters that folks had made, I noticed that the way they're put together, it's as though they're kind of designed to be readily reproducible. Like I could take kind of this code snippets they built there. If I pulled the same data, I could run it and go, like I found the same result. Like I can see that like this thing you built is legit. Was that something that like the way it's structured, that approach that was intentional or did it kind of emerge as the projects were coming along?
Christian Martinez (29:55.438)
Great question. No, that was the axiom of the entire project. in a more condensed title, the class that I was teaching was reproducible research using R. The goal was to have a class that helped students learn what to do after they've done their experiment and they have their data. And I cannot stress how important reproducibility is.
in the scientific community because that's what validates your findings. You want your experiment to be reproducible. You want it where if someone runs it 100 times, a thousand times, 100,000 times, you get the same results. Because if I do it one time and I get something and I do it the exact same and get another, that's not reproducible. That's not fact anymore. That's not something that we can utilize. You wouldn't want the...
the new vaccine or the new medicine to have different results every time they studied it. And so when my students are, which they're now completing their thesis, you want anyone in the scientific community to be able to replicate it so that they know it's legit. And so same thing with this. Sure, we're not working with a big company and this is just for fun and a research project, but...
The underlying axiom is still true. We need to make sure that our work is reproducible. It's the most important part, in my opinion.
Ryan Koch (31:32.143)
That does seem like a really critical lesson, particularly for students, right? Because as you kind of go into life, there's many situations where the ability to then, I guess if you make something that's reproducible, that maybe also gives you the skill of being able to know what to look for to see if you can reproduce something else, right? Like whether you work or you're reading about current events, there's often times you run across a paper or something and be great. Maybe you're curious and you like want to try to test it yourself.
So it seems like a valuable skill that folks are kind of learning as they build something that allows for that.
Christian Martinez (32:05.741)
I would like to think so and even reproducibility within their own self. okay, let's write a comment to describe what this piece of code does. Because how many times, at least in my world, I've gone back to code that I thought was great and I've looked at them and like, what does this even do? And so even for...
even outside of other people using it, making sure it's reproducible within your own individual self.
Ryan Koch (32:38.362)
man, that... what does that even do reaction? That's like me looking at my like six months ago code.
Christian Martinez (32:43.181)
Yeah, and sometimes it's brutal. You're like, what did I, who wrote this? I didn't write this.
Ryan Koch (32:53.477)
When you think about your experiences working with New York City and New York State's respective open data infrastructure, what are things that you're thinking are going well right now and things that you wish would be different?
Christian Martinez (33:08.589)
Great question. Number one, I have to give them so much credit. First of all, very user friendly and they really want people to access it. New York City Open Data Portal has, I think maybe once a month, if not once every few months, an introductory course. So that's what I teach as an ambassador. And it's for free and you can come and learn how to use the basics.
So imagine you didn't know what the New York Street Open Data Platform was like. You could take this course and you'll come out having a pretty good understanding of what it offers and what to do. I also had to give them credit in that very user-friendly and whatever update they just made, the ability to use it has been updated and is so fast now. before, let's say I wanted to make a map, right? I had three on one requests and I had the last whatever.
100,000. I'd have to filter, so maybe there's only 500 or else the rendering of my map would take so long, you'd think you'd get the circle of death, just keep spiraling and spiraling. Now, almost instantaneous, which is a huge impact on not only my teaching, but if people want to create things on their own. I think if I had to make a suggestion, one thing that has been
that's available but I did not know is that they have this metadata package, the metadata data set, which has the metadata on all the different data sets, what category they're in, how many views they have, downloads, etc. And that's not something that I was made readily available about, had made known that was readily available until last month.
So I'd have been an ambassador for too long. And so I think that could be promoted a little bit more because maybe that's something that other people can utilize and increase their potential to use the platform.
Ryan Koch (35:21.071)
Let's say there's some intrepid, civic-minded person out there, maybe they haven't had a chance to be in your class yet, and they're listening to this and they're like, man, I wanna learn this stuff. I wanna learn how to use open data to answer questions. I wanna learn these kind of research skills. What sort of advice would you give them as they just try to get started to kind of harken back to our, you know, it's important to get started thing from before?
Christian Martinez (35:46.538)
I would say number one, play around and make mistakes. I don't know how relevant this quote is, but yesterday I was in the Posit Data Lab. They have like one, I think once a month. And one of the presenters said, don't let AI take away your stupidity. Really meaning like, just make some mistakes, have fun, play around. And so if you are an intrepid person that's trying to get involved, first go on the New York City Open Data Portal.
and try to answer a question. Even something so simple as...
Who complains the most about potholes in each borough? So something like that. I think if you are a little more advanced and know how to use R or a different programming language, go use the New York City or New York state or MTA package and do the same thing. Explore, build a graph, something like that. Figure out a question that, take a question you would like to see answered in your city and see if you can answer it in New York city.
have fun and reach out to me. I'm so interested in partnering with other people and exploring how much more this ecosystem can grow.
Ryan Koch (37:01.166)
I meant that pothole comment takes me back a bit. Folks who've listened to this podcast for like a long time might remember that there is, we had an episode where someone was talking about like the Chicago open data portal. And it was very much one of those like, you never quite know what impact the question you ask and use the data to do is gonna have where there's this whole thing where there's snowplow data and folks ended up kind of finding a pattern where, wow, it's strange. Like there's the side road.
Christian Martinez (37:12.525)
Mm-hmm.
Christian Martinez (37:22.732)
Mm-hmm.
Ryan Koch (37:29.164)
that the snowpall always seems to go to first, even though there's primary roads that aren't done yet. And the algorithm is kinda like they're supposed to do these kind of thoroughfares and there's a process, right? But for some reason, the street, and then it turns out it was an Alderman's private residence street, they got a bunch of trouble and there's a whole new story thing about it. But you never really know what's gonna come of your curiosity, I guess, is the lesson from that kind of thing.
Christian Martinez (37:34.657)
Wow.
Christian Martinez (37:39.522)
Mm-hmm.
Christian Martinez (37:45.734)
Yeah.
Christian Martinez (37:55.212)
Yeah, and I bring up potholes. That's fun. I'm interested in learning more about that because Mamdani and he's not the first mayor to do this, but they had a recent pothole blitz where they fixed like 7,000 different potholes across New York City. So I'd love to see if there is any relationship into kind of like what you were talking about, like where the potholes specifically were fixed. And maybe...
In a perfect world, maybe they used New York City Open Data to find the potholes that were the most complained about. And maybe they fixed those. Maybe they didn't. Who knows? Who knows? Maybe one of your listeners will research it and find out.
Ryan Koch (38:34.309)
yeah, actually if one of those folks are out there, should let me know if they build a little project for that. Yeah, because I mean you could even maybe go so far as to try to extrapolate, like where should I think that there's the greatest pothole risk based on like past occurrences, you know? You get into odd things about microclimates and things or like where water flows through drainage and I'm sure there's a lot of complex stuff that goes into how a pothole forms.
Christian Martinez (38:48.418)
Yeah.
Christian Martinez (38:58.669)
Yeah, even I'm thinking like you could probably match some traffic data on that as well, because the more cars, the higher erosion most likely. And or or to your point, maybe you could do a relationship between the amount of cars and maybe flooding because they have flood complaints as well. See if there's a relationship between potholes, flooding and or amount of traffic.
Ryan Koch (39:09.635)
true.
Ryan Koch (39:26.185)
There you go, I think we just spun up like three or four different project ideas for your future students.
Christian Martinez (39:27.629)
Yeah. Who knows? Maybe in the next iteration, one of the next chapters is that question exactly.
Ryan Koch (39:36.767)
that'd be fantastic. And related to this kind of getting involved kind of question thread, if there are folks out there that maybe they're around New York or New York interested and they want to get involved with Open Data New York City, how should they go about doing that?
Christian Martinez (39:54.476)
I would say reach out, go on GitHub, see what's available and forward the repository, play around, see what you can build. I would also say that Ryan, you inspired me during our prep call last week because I had the same like, man moment when I got introduced to the New York state open data portal. When you said, man, it'd be so cool if there were other packages related to other
city or state open data portals. And I was like, I didn't even think about the fact that other states and cities have the same thing. So I'm currently trying to develop a Chicago open data package for R. Yeah. I have to, again, a lot of the credit goes to the R Open Science community for helping me out, but a lot of the code is transferable. So I've been kind of able to just plug in
like the API JSON link, like the main one, and play around. And right now I'm in the testing phase. Hopefully, maybe even by today, later today, it'll be submitted to CRAN. But I'd love if people took some of the code that I have for any of the packages I have and tried to make it for their own portals. I have noticed a lot of the portals seem to be made by the same company. They look almost identical.
Like if you look at New York City Open Data Platform and the Chicago one, kind of the same deal.
Ryan Koch (41:29.096)
yeah, I suppose that makes sense. It's not exactly a... I mean, the data is different, but it's like the same problem to solve, right? Like, how do you, you know, take a bunch of different formats of data and reliably share it?
Christian Martinez (41:35.275)
Yeah.
Christian Martinez (41:42.027)
I know that New York State and New York City have the same vendor. Maybe that Chicago and maybe LA or Austin also have the same vendor. Who knows? Maybe there's one monopoly of people that are making open data platforms.
Ryan Koch (41:55.752)
And it seems we've ended up with like a surprise call to action here on the tail end where it's like, hey, like, you know, reach out if you're interested in figuring out your own municipality or state's open data portal. And maybe you want to help put together some more package there. Also, it sounds like you're already kind of working on a Chicago one. So maybe the Chicago ones in the audience might want to get involved with you to help out.
Christian Martinez (41:59.745)
Yeah.
Christian Martinez (42:20.215)
Please, I'd love that and if someone wants to help build it before I publish it, I'd gladly release myself of it and someone can take it on. I'd love to partner with it, with someone on it.
Ryan Koch (42:34.262)
Cool, and then I think for those things, maybe we could put some sort of contact info thing in the show notes for folks that wanna reach out and say like, hey, hey Christian, I wanna help you out with this stuff.
Christian Martinez (42:40.898)
Yeah.
Christian Martinez (42:44.641)
Definitely.
Ryan Koch (42:47.916)
Cool, and on that note, Christian, thank you so much for joining us here on Civic Tech Chat. This was a really fun conversation, and I think folks will find interesting stuff to bring in today, interesting lessons they learned that they might be able to bring into their work or hobbies or whatever you want to call it.
Christian Martinez (43:04.203)
Yeah, I hope so. Thanks so much for having me. This was a lot of fun.
Ryan Koch (43:12.87)
Alright, hit stop. It's doing its processing thing, but I think we got it.
Ryan Koch (43:41.187)
But yeah, no, I really appreciate you coming on and doing the interview. That really was a fun conversation.
Christian Martinez (43:46.732)
Yeah, how do you think?